BDA Unit 5
BDA Unit 5
These databases have fixed or static or predefined schema They have dynamic schema
SQL requires specialized DB hardware for better NoSQL uses commodity hardware.
performance
Examples: Oracle, Postgres, and MS-SQL. Examples: MongoDB, Redis, Neo4j, Cassandra, Hbase.
Many of NoSQL databases provides eventual
Provides strong consistency
consistency(Except MongoDB)
Introduction
• MongoDB is a database management system designed to rapidly develop web
applications and internet infrastructure.
• The data model and persistence strategies are built for high read-and-write
throughput.
• It has the ability to scale easily with automatic failover.
• MongoDB stores its information in documents rather than rows.
Introduction
{
_id: 10,
username: ‘ABC',
email: [
‘[email protected]',
‘[email protected]'
]
}
Introduction
• MongoDB’s document format is based on JSON (JavaScript Object Notation).
• It is a known scheme for storing arbitrary data structures.
• JSON structures consist of keys and values.
• RDBMS uses multitable joins, to get the complete information about any data item
which is distributed among many tables.
• Whereas, with a document model, most of a product’s information can be represented
within a single document.
• User can easily get a complete representation of any data item with all its information
hierarchically organized in a JSON-like structure in a MongoDB JavaScript shell.
• User can query the document and manipulate it.
• With MongoDB, an object defined in the object oriented programming language can be
persisted easily.
MongoDB’s key features
• Document data model:-
• MongoDB’s data model is document-oriented, which means that data is stored as
documents, and documents are grouped in collections.
• The documents in a single collection don't necessarily need to have exactly the same
set of fields.
• This is also called as a “flexible schema.”
• This flexibility allows developers to iterate faster and migrate data between different
schemas without any downtime.
• Documents in MongoDB are stored in the BSON format, which is a binary-encoded
JSON format.
• As, the data is stored in a binary format, which is much faster than JSON.
• This also allows for the storage of binary data, which is useful for storing images,
videos, and other binary data
MongoDB’s key features
• Ad hoc queries:-
• When designing the schema of a database, it is impossible to know in advance all the
queries that will be performed by end users.
• An ad-hoc query is a short-lived command whose value depends on a variable.
• Each time an ad-hoc query is executed, the result may be different, depending on the
variables in question.
• This is why MongoDB, a document-oriented, flexible schema database, stands apart as
the cloud database platform of choice for enterprise applications that require real-time
analytics.
• With ad-hoc query support that allows developers to update ad-hoc queries in real
time, the improvement in performance can be achieved.
MongoDB’s key features
• Suppose you want to find all posts tagged with the term politics having more than 10
votes.
• A SQL query would look like this:
• SELECT * FROM posts
• INNER JOIN posts_tags ON posts.id = posts_tags.post_id
• INNER JOIN tags ON posts_tags.tag_id == tags.id
• WHERE tags.text = 'politics' AND posts.vote_count > 10;
• The equivalent query in MongoDB is specified using a document as a matcher.
• The special $gt key indicates the greater-than condition:
• db.posts.find({'tags': 'politics', 'vote_count': {'$gt': 10}});
MongoDB’s key features
• Indexes:
• As more ad-hoc queries and more documents were added database, searching for a
value becomes increasingly expensive.
• Index are used to efficiently search through the data.
• Indexes are intended to improve search speed and performance.
• If an appropriate index exists for each query, user requests can be optimally executed
by the server.
• MongoDB offers a broad range of indexes and features with language-specific sort
orders that support complex access patterns to datasets.
• MongoDB indexes can be created on demand to accommodate real-time, ever-
changing query patterns and application requirements
MongoDB’s key features
• Indexes:
• The primary key is generally indexed automatically so that each datum can be
efficiently accessed using its unique key.
• But not every database allows to index the data inside that row or document.
• These are called secondary indexes.
• Many NoSQL databases, such as HBase, are considered keyvalue stores because they
don’t allow any secondary indexes.
• This is a significant feature in MongoDB, by permitting multiple secondary indexes
MongoDB allows users to optimize for a wide variety of queries.
• With MongoDB, one can create up to 64 indexes per collection.
MongoDB’s key features
• Replication:-
• MongoDB provides database replication via a topology known as a replica
set.
• Replica sets distribute data across two or more machines for redundancy
and automate failover in the event of server and network outages.
• Additionally, replication is used to scale database reads.
• Replica sets consist of many MongoDB servers, with each server on a
separate physical machine called as nodes.
• At any given time, one node serves as the replica set primary node and one
or more nodes serve as secondaries.
MongoDB’s key features
• Replication:-
• A replica set’s primary node can accept both reads and writes, but the secondary
nodes are read only.
• Replica set’s is unique as they support for automated failover.
• If the primary node fails, the cluster will pick a secondary node and automatically
promote it to primary.
• When the former primary comes back online, it’ll do so as a secondary.
MongoDB’s key features:Replication
MongoDB’s key features
• Speed and Durability:-
• In database systems there is an inverse relationship between write speed and
durability.
• Write speed can be understood as the volume of inserts, updates, and deletes that a
database can process in a given time frame.
• Durability refers to level of assurance that these write operations have been made
permanent.
• In MongoDB’s, users control the speed and durability trade-off by choosing write
semantics and deciding whether to enable journaling.
• With journaling, every write is flushed to the journal file every 100 ms.
• If the server is ever shut down improperly, the journal will be used to ensure that
MongoDB’s data files are restored to a consistent state when you restart the server.
• This safety MongoDB provides.
MongoDB’s key features
• Scaling:-
• The easiest way to scale most databases is to upgrade the hardware.
• There are two ways of scaling:-
• Vertical Scaling:-The technique of augmenting a single node’s hardware for scale also
called as scaling up.
• Vertical scaling has the advantages of being simple, reliable, and cost-effective.
• Horizontal Scaling:- means distributing the database across multiple machines, also
called as scaling out.
• A horizontally scaled architecture can run on many smaller, less expensive machines.
MongoDB’s key features
• Scaling:-
• MongoDB was designed to make horizontal scaling manageable.
• It uses a range-based partitioning mechanism, known as sharding for horizontal
scaling.
• Sharding automatically manages the distribution of data across nodes.
• The sharding system handles the addition of shard nodes, and it also facilitates
automatic failover.
• Individual shards are made up of a replica set consisting of at least two nodes, ensuring
automatic recovery with no single point of failure.
MongoDB’s key features
Scaling
MongoDB’s Core Server and Tools
• MongoDB is written in C++ and actively developed by MongoDB, Inc.
• The project compiles on all major operating systems, including Mac OS X, Windows,
Solaris, and most flavors of Linux.
• MongoDB is open source and licensed under the GNU-Affero General Public License
(AGPL).
• The source code is freely available on GitHub, and contributions from the community
are frequently accepted.
• But the project is guided by the MongoDB, Inc. core server team
MongoDB’s Core Server and Tools
• Core server:-
• The core database server runs via an executable called mongod.
• The mongod server process receives commands over a network socket using a custom
binary protocol.
• All the data files for a mongod process are stored by default in /data/db on Unix-like
systems and in c:\data\db on Windows.
• Configuring a mongod process is simple and can be accomplished both with command-
line arguments and with a text configuration file.
• mongod can run in either a standalone server or a member of a replica set modes.
• Use MongoDB’s sharding feature, to run mongod in config server mode.
• A separate routing server exists called mongos, which is used to send requests to the
appropriate shard in this kind of setup.
MongoDB’s Core Server and Tools
• JavaScript shell:-
• The MongoDB command shell is a JavaScript-based tool for administering the database
and manipulating data.
• The mongo executable loads the shell and connects to a specified mongod process, or
one running locally by default.
• To insert documents in the collection, use insert command i.e. java script expression
i.e.,
• db.users.insert({name: “ABC"}).
• The find command returns the inserted document, with an object ID added.
• All documents require a primary key stored in the _id field.
• Common java script commands are used for viewing the current database operation,
checking the status of replication to a secondary node, and configuring a collection for
sharding.
MongoDB’s Core Server and Tools
• Database drivers:-
• MongoDB drivers are easy to use.
• The driver is the code used in an application to communicate with a MongoDB server.
• All drivers have functionality to query, retrieve results, write data, and run database
commands.
• It offers an API that matches the syntax of the given language and also maintain
relatively uniform interfaces across languages.
• MongoDB, Inc. officially supports drivers for C, C++, C#, Erlang, Java, Node.js,
JavaScript, Perl, PHP, Python, Scala, and Ruby.
MongoDB’s Core Server and Tools
• Command-line tools:-
• MongoDB is bundled with several command-line utilities:
mongodump and mongorestore—Standard utilities for backing up and restoring
• a database. mongodump saves the database’s data in its native BSON format and
• thus is best used for backups only; this tool has the advantage of being usable
• for hot backups, which can easily be restored with mongorestore.
mongoexport and mongoimport—Export and import JSON, CSV, and TSV7 data;
• This is useful when we need data in widely supported formats.
• Mongoimport can also be good for initial imports of large data sets.
MongoDB’s Core Server and Tools
• Command-line tools:-
mongosniff—A wire-sniffing tool for viewing operations sent to the database.
• It translates the BSON going over the wire to human-readable shell statements.
mongostat—Similar to iostat, this utility constantly polls MongoDB and the
system to provide helpful stats, including the number of operations per second
(inserts, queries, updates, deletes, and so on), the amount of virtual memory
allocated, and the number of connections to the server.
MongoDB’s Core Server and Tools
• Command-line tools:-
mongotop—Similar to top, this utility polls MongoDB and shows the amount of time it
spends reading and writing data in each collection.
mongoperf—Helps you understand the disk operations happening in a running
MongoDB instance.
mongooplog—Shows what’s happening in the MongoDB oplog.
Bsondump—Converts BSON files into human-readable formats including JSON.
MongoDB through the JavaScript shell
• The MongoDB shell is the go-to tool for experimenting with the database,
running ad-hoc queries, and administering running MongoDB instances.
• For writing an application in MongoDB, language drivers are used.
• The shell is used to test and refine these queries.
• All MongoDB queries can be run from the shell.
• MongoDB shell allows to examine and manipulate data and administer the
database server itself.
• MongoDB’s shell differs from others, in its query language.
• Instead of using a standardized query language such as SQL, it uses the
JavaScript programming language and a simple API.
• JavaScript scripts in the shell interact with a MongoDB database.
MongoDB through the JavaScript shell
• Starting the shell:-
• Start the MongoDB shell by running the mongo executable:
• mongo
• O/P:- The shell heading displays the version of MongoDB running, and
additional information about the currently selected database.
MongoDB through the JavaScript shell
• In MongoDB, The group of documents, similar to a table in an RDBMS is this is
called a collection.
• MongoDB divides collections into separate databases.
• If no other database is specified on startup, the shell selects a default database
called test.
• The command used to switch to a new database is
• > use tutorial
• switched to db tutorial
• MongoDB writes its data out to disk.
• From memory perspective, all collections in a database are grouped in the same
files, to keep related collections in the same database.
MongoDB through the JavaScript shell
• Inserts and queries:-
• As MongoDB uses a JavaScript shell, the documents will be specified in JSON.
• To create the document contiaing username is
• {username: "smith"}
• The document contains a single key and value for storing Smith’s username.
• To save this document, choose a collection to save it to.
• db.users.insert({username: "smith"})
• O/P:- WriteResult({ "nInserted" : 1 })
• In the default MongoDB configuration, this data is now guaranteed to be
inserted even if the shell is killed or suddenly machine is restarted.
MongoDB through the JavaScript shell
• User can issue a query to see the new document:
• db.users.find()
• Since the data is now part of the users collection, reopening the shell and
running the query will show the same result.
• O/P:-
• { "_id" : ObjectId("552e458158cd52bcb257c324"), "username" : "smith" }
• To count number of records in the collection use the commands as:-
• db.users.count()
MongoDB through the JavaScript shell
• MongoDB supports a simple query selector to the find method.
• A query selector is a document that’s used to match against all documents in
the collection.
• db.users.find({username: “smith"})
• O/P:- { "_id" : ObjectId("552e542a58cd52bcb257c325"), "username" : “smith" }
• The query predicate {username: “smith"} returns all documents where the user.
name is smith.
MongoDB through the JavaScript shell
• The predicate perform ANDing operation on the fields, so this query searches
for a document that matches on both the _id and username fields.
• To do this MongoDB’s $and operator can also be used explicitly as:-
• db.users.find({ $and: [ { _id: ObjectId("552e458158cd52bcb257c324") }, {
username: "smith" } ] })
• MongoDB also supports OR operator as:-
• db.users.find({ $or: [{ username: "jones" },{ username: "smith" } ]})
MongoDB through the JavaScript shell
• Updating documents:-
• All updates require at least two arguments.
• The first specifies which documents to update, and the second
defines how the selected documents should be modified.
• There are two general types of updates, with different properties and
use cases.
• 1. Update by applying modification operations to a document
• 2. Update by replacing the old document with a new one
MongoDB through the JavaScript shell
• Operator Update:-
• This type of update involves passing a document with some kind of operator
description as the second argument to the update function.
• It uses $set operator, which sets a single field to the specified value.
• For e.g.:-
• db.users.update({username: "smith"}, {$set: {country: "Canada"}})
• O/P:-
• WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
• This update tells MongoDB to find a document where the username is smith,
and then to set the value of the country property to Canada.
MongoDB through the JavaScript shell
• Replacement Update:-
• In this update a document is to replaced by another value.
• db.users.update({username: "smith"}, {country: "Canada"})
• O/P:-
• WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
• In this case, the document is replaced with one that only contains the country
field, and the username field is removed
• This is beacause, the first document is used only for matching and the second
document is used for replacing the document that was previously matched.
MongoDB through the JavaScript shell
• To add or set fields rather than to replace the entire document, use the $set as:-
• db.users.update({country: "Canada"}, {$set: {username: "smith"}})
• O/P:-
• WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
• To see the updations use,
• db.users.find({country: "Canada"})
• O/P:-
• { "_id" : ObjectId("552e458158cd52bcb257c324"), "country" : "Canada",
"username" : "smith" }
MongoDB through the JavaScript shell
• To remove the value which updated using set operator use the $unset
operator:-
• db.users.update({username: "smith"}, {$unset: {country: 1}})
• O/P:-
• WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
• To see the results of unset use,
• db.users.find({username: "smith"})
• O/P:-
• { "_id" : ObjectId("552e458158cd52bcb257c324"), "username" : "smith" }
MongoDB through the JavaScript shell
• Updating Complex Data:-
• MongoDB represents the data with documents, which can contain complex
data structures.
• The favorites key points is used to an object for performing sorting of data as,
• db.users.update( {username: "smith"},{ $set: { favorites: { cities: ["Chicago",
"Chennai"], colors: [“red", “blue", “green"] } } })
• O/P:-
• WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
MongoDB through the JavaScript shell
• Deleting data:-
• If given no parameters, a remove operation will clear a collection of all its
documents.
• For e.g. db.foo.remove()
• To remove only a certain subset of a collection’s documents, pass a query
selector to the remove() method.
• For e.g. db.users.remove({"favorites.cities": "Chennai"})
• O/P:-
• WriteResult({ "nRemoved" : 1 })
MongoDB through the JavaScript shell
• The remove() operation doesn’t actually delete the collection, it only removes
documents from a collection.
• To delete the collection along with all of its indexes, use the drop() method as,
• db.users.drop()
Creating and querying with indexes
• MongoDB’s indexes can be created from the shell.
• Creating a large collection:-
• Indexes can be beneficial if there is a collection with many documents.
• Create a collection as number with 20,000 simple documents.
• > for(i = 0; i < 20000; i++)
{
db.numbers.save({num: i});
}
• O/p:-
• WriteResult({ "nInserted" : 1 })
Creating and querying with indexes
• On the above collection now, we can try different queries as:
• > db.numbers.find({num: 500})
• O/P:-
• { "_id" : ObjectId("4bfbf132dba1aa7c30ac84fe"), "num" : 500 }
• Range Queries can also be implemented using $gt and $lt operators as:-
• > db.numbers.find( {num: {"$gt": 19995 }} )
• O/P:-
• { "_id" : ObjectId("552e660b58cd52bcb2581142"), "num" : 19996 }
• { "_id" : ObjectId("552e660b58cd52bcb2581143"), "num" : 19997 }
• { "_id" : ObjectId("552e660b58cd52bcb2581144"), "num" : 19998 }
• { "_id" : ObjectId("552e660b58cd52bcb2581145"), "num" : 19999 }
Creating and querying with indexes
• > db.numbers.find( {num: {"$gt": 20, "$lt": 25 }} )
• O/P:-
• { "_id" : ObjectId("552e660558cd52bcb257c33b"), "num" : 21 }
• { "_id" : ObjectId("552e660558cd52bcb257c33c"), "num" : 22 }
• { "_id" : ObjectId("552e660558cd52bcb257c33d"), "num" : 23 }
• { "_id" : ObjectId("552e660558cd52bcb257c33e"), "num" : 24 }
Creating and querying with indexes
• Indexing and explain( ):-
• When any database receives a query, it must plan out how to execute it.
• This is called a query plan.
• EXPLAIN describes query paths and helps to diagnose slow operations by
determining which indexes a query has used.
• MongoDB has its own version of EXPLAIN that provides the how the query
executes.
• > db.numbers.find({num: {"$gt": 19995}}).explain("executionStats")
• The "execution- Stats" keyword explains how the given query works and
requests a different mode that gives more detailed output.
Output of explain(“executionStats”) of unindexed query
Creating and querying with indexes
• The query engine has to scan the entire collection, all 20,000 documents , to
return only four results (nReturned).
• The value of the totalKeysExamined field shows the number of index entries
scanned, which is zero.
• This shows a large difference between the number of documents scanned and
the number returned, due to this the query is called as an inefficient query.
• In a real-world situation, where the collection and the documents are larger,
the time needed to process the query would be substantially greater than the
eight milliseconds (generally required for any machine).
Creating and querying with indexes
• The collection needs is an index, to access it systematical and faster.
• Create an index for the num key within the documents using the createIndex()
methodas:-
• > db.numbers.createIndex({num: 1})
•{
• "createdCollectionAutomatically" : false,
• "numIndexesBefore" : 1,
• "numIndexesAfter" : 2,
• "ok" : 1
•}
• num:1 will create an index in ascending order.
Creating and querying with indexes
• To verify that the index has been created by calling the getIndexes() method.
• The collection now has two indexes.
• 1. The first is the standard _id index that’s automatically built for every
collection.
• 2. The second is created on num.
• The indexes for those fields are called _id_ and num_1, respectively.
Creating and querying with indexes
> db.numbers.getIndexes() " {
[ "v" : 1,
{ "key" :
"v" : 1, {
"key" : "num" : 1
{ },
"_id" : 1 "name" : "num_1",
}, "ns" : "tutorial.numbers"
name" : "_id_", }
"ns" : "tutorial.numbers" ]
},
Output of explain() of indexed query
Creating and querying with indexes
• Now the query utilizes the index num_1 on num, it scans only the four
documents pertaining to the query.
• This reduces the total time to serve the query.
• Indexes take up some space and can make inserts slightly more expensive, but
they’re an essential tool for query optimization.
Principles of Schema Design
• Database schema design is the process of choosing the best representation for a data
set, given the features of the database system, the nature of the data, and the
application requirements.
• The principles of schema design for relational database systems are well established.
• With RDBMSs, a normalized data model, helps to ensure generic query ability and
avoid updates to data, which may leads to inconsistencies.
• But schema design is never an exact, even with relational databases.
• MongoDB has lack of hard schema design rules.
• There can be more than one good way to model a given data set.
• There are many principles that can drive schema design, but the reality is that those
principles are flexible( not static).
Principles of Schema Design
• Before designing a proper schema for any database verify the answers of
following questions:-
• What are your application access patterns? Understand the needs of your
application, this give you information about schema design and also which
database to choose. Understanding your application access patterns is most
important aspect of schema design.
• What’s the basic unit of data?
RDBMS uses tables with columns and rows.
key-value store uses, keys pointing to amorphous values.
MongoDB uses, the basic unit of data is the BSON document.
Principles of Schema Design
• What are the capabilities of your database? After understanding the basic data
type, find out how to manipulate it. RDBMSs feature ad hoc queries and joins,
usually written in SQL. MongoDB also allows ad hoc queries, but joins are not
supported.
• What makes a good unique id or primary key for a record? the database system,
have some unique key for each record. Choosing this key carefully can make a
big difference in how to access thdata and how it’s stored.
Principles of Schema Design
• The best schema designs are always the product of deep knowledge of the
database, good judgment about the requirements of the application at hand,
and experience.
• A good schema requires experimentation and iteration, such as when an
application scales and performance considerations change.
• It is rarely possible to fully plan an application before its implementation and
have a static schema.
Constructing queries on Databases
• For this point refer the text book MongoDB in Action.
• Go through the real life example(case study) of E-commerce data
model given on page no. 99.
Collections and Documents
• Database:-
• In MongoDB, a database contains the collections of documents. One can create
multiple databases on the MongoDB server.
• View Database:
• To see how many databases are present in your MongoDB server,use command as :
show dbs
• By default it shows following three default databases,
• O/P:-
• admin 0.000GB
• config 0.000GB
• local 0.000GB
Collections and Documents
• Creating Database:
• In the mongo shell, allows to create a database with the help of the following
command:
> use database_name
• This command actually switches to the new database if the given name does
not exist and if the given name exists, then it will switch to the existing
database
Collections and Documents
• Collection:-
• Collections are like tables in RDMS, they also store data, but in the form of
documents.
• A single database is allowed to store multiple collections.
• Creating collection:
• After creating database create a collection to store documents.
• The collection is created using the following syntax:
> db.createCollection("users")
Collections and Documents
• To insert the data in the collection:-
> db.collection_name.insertOne({..})
• Here, insertOne() function is used to store single data in the specified
collection.
db.mycollection.insertOne({"name" : “abc"})
• To insert many documents use,
• db.mycollection.insert( [ { _id: 11, name: “abc”, rollno: 10, class: “FE” },
• { name: “xyz”, rollno: 20, class: “TE” },
• { name: “pqr”, rollno: 25, class: “BE” } ] )
Collections and Documents
• Document:-
• In MongoDB, the data records are stored as BSON. The document is created
using field-value pairs or key-value pairs and the value of the field can be of any
BSON type.
• Syntax:
• { field1: value1
• field2: value2
• ....
• fieldN: valueN
• }
MongoDB Query Language
• Refer Experiment No. 6 as a this study material in theory for this
point.