ADO Lecture v 2023-25
ADO Lecture v 2023-25
Lecture V
MBA(DSDA) 2024-26, SCIT
ADO
Session Plan
1. Introduction to ADO, Data, Big Data, Time-Series Data, Spatial
Data, Graph Data, Streaming Data, Session Plan, Cos (0.5 Session)
2. Features of Database. Structured/Semi-Structured/Unstructured,
SQL DBs, NoSQL DBs, NewSQL DBs, ACID- CAP-BASE Property,
Distributed Databases. (0.5 Session)
3. Journey from RDBMS to NoSQL- BigTable, Dynamo DB, Hbase,
Cassandra, VoltDB. (2 Sessions)
4. MongoDB (in Detail) (3-4 Sessions)
5. Neo4j (in Detail) (2 Sessions)
6. Time-Series DB (if time Permits) (1 Session)
7. Data Lakes and Data Quality Management (1 Session)
ADO
MongoDB
1. Document Data Concept/Model
2. MongoDB platform
3. Basic Commands
4. CRUD
5. Indexing, Data Types
6. File Import/Export
7. GridFS
8. Collection
9. Spatial Features, Time-series
10. Complex Queries
MongoDB
Basic commands
> show dbs
> show collections
>db.stats()
>db.numbers.stats()
MongoDB
Indexing
• Indexes support efficient execution
of queries
• Adding an index has negative
performance impact for write
operations
https://www.mongodb.com/docs/manual/core/indexes/
MongoDB
Indexing
• A human resources department often needs
Single Field
• db.collection.createIndex( <keys>,
<options>, <commitQuorum>)
Index
• db.<collection>.createIndex( {
<field1>: <sortOrder>,
You can specify up to 32 fields in a
Compound
<field2>: <sortOrder>,
single compound index.
...
Index
<fieldN>: <sortOrder>
• })
https://www.mongodb.com/docs/manual/reference/method/db.collection.createIndex/
MongoDB
Indexing Types
Single Field
Index
https://www.mongodb.com/docs/manual/core/indexes/index-types/
MongoDB
Indexing Types Compound
Index
https://www.mongodb.com/docs/manual/core/indexes/index-types/
MongoDB
Indexing Types Compound
Index
db.students.find({gpa:3.6})
https://www.mongodb.com/docs/manual/core/indexes/index-types/
MongoDB
Indexing Types Multikey
Index
• Multikey indexes collect and sort
data stored in arrays.
• You do not need to explicitly specify
the multikey type. When you create
an index on a field that contains an
array value, MongoDB automatically
sets the index to be a multikey index.
https://www.mongodb.com/docs/manual/core/indexes/index-types/
MongoDB
Indexing Types Multikey
Index
https://www.mongodb.com/docs/manual/core/indexes/index-types/
MongoDB
Indexing Types [ Multikey
db.students.insertMany(
{ Index
"name": "Andre Robinson",
"test_scores": [ 88, 97 ]
},
{
"name": "Wei Zhang",
"test_scores": [ 62, 73 ]
},
{
"name": "Jacob Meyer",
"test_scores": [ 92, 89 ]
}
])
https://www.mongodb.com/docs/manual/core/indexes/index-types/
MongoDB
Indexing Types Multikey
Index
db.students.createIndex({ test_scores:
1 } )
https://www.mongodb.com/docs/database-tools/
MongoDB
File Import
• mongoimport
Importing files JSON Files
***********
mongoimport --db dsda2325 --collection restaurants --type=json --file G:\primer-
dataset.json
OR
OR
>db.restaurants.countDocuments()
25359
>db.restaurants.find()
https://www.mongodb.com/docs/database-tools/
MongoDB
File Import
• mongoimport
Importing files csv Files- with Headerline
***********
mongoimport --db dsda2325 --collection trips --
drop --type=csv --headerline --file G:\2014-02-
Citi Bike trip data.csv
https://www.mongodb.com/docs/database-tools/
MongoDB
File Import
• mongoimport
Importing files csv Files- without Headerline
***********
mongoimport
--db dsda2325
--collection=trips
--drop
--file =G:\2014-02-Citi Bike trip data.csv
--type=csv
--fields="tripduration","starttime","stoptime","start station id","start station
name","start station latitude","start station longitude","end station id","end
station name","end station latitude","end station
longitude","bikeid","usertype","birth year","gender"
https://www.mongodb.com/docs/database-tools/
MongoDB
File Import
• mongoimport
Importing files csv Files- without Headerline- details in text file
***********
mongoimport
--db dsda2325
--collection=trips
--drop
--file =G:\2014-02-Citi Bike trip data.csv
--type=csv
--fields=https://www.mongodb.com/docs/database-tools/
G:\field_file.txt
MongoDB
File Import
• mongoimport
Importing files csv Files- without Headerline- details in text file along with
column types
***********
mongoimport
--db dsda2325
--collection=trips
--drop
--file =G:\2014-02-Citi Bike trip data.csv
--type=csv
--columnsHaveTypes
--fieldFile = G:\field_file_with_types.txt
https://www.mongodb.com/docs/database-tools/
MongoDB
GridFS
GridFS is a specification for storing and
retrieving files that exceed the BSON-document
size limit of 16 MB.
https://www.mongodb.com/docs/manual/core/gridfs/
MongoDB
GridFS
• GridFS is the MongoDB specification for
storing and retrieving large files such as
images, audio files, video files, etc.
• It is kind of a file system to store files but its
data is stored within MongoDB collections.
• GridFS has the capability to store files even
greater than its document size limit of 16MB.
https://www.mongodb.com/docs/manual/core/gridfs/
MongoDB
GridFS
• GridFS divides a file into chunks and stores
each chunk of data in a separate document,
each of maximum size 255k
• GridFS by default uses two collections fs.files
and fs.chunks to store the file's metadata and
the chunks.
https://www.mongodb.com/docs/manual/core/gridfs/
MongoDB
GridFS
• The size of each chunk in bytes. GridFS divides
the document into chunks of size chunkSize,
except for the last, which is only as large as
needed. The default size is 255 kilobytes (kB).
https://www.mongodb.com/docs/manual/core/gridfs/
MongoDB
GridFS
• If you want to change the default size of
document, then use following command
https://www.mongodb.com/docs/manual/core/gridfs/
MongoDB
GridFS
Go to bin directory of MongoDB Tool 100
show databases
use video
show collections
use video
show collections
db.dropDatabase()
https://www.mongodb.com/docs/manual/core/gridfs/
MongoDB
DataTypes
MongoDB supports many datatypes. Some of them are −
• String − This is the most commonly used datatype to store the data. String in
MongoDB must be UTF-8 valid.
• Integer − This type is used to store a numerical value. Integer can be 32 bit or
64 bit depending upon your server.
• Boolean − This type is used to store a boolean (true/ false) value.
• Double − This type is used to store floating point values.
• Min/ Max keys − This type is used to compare a value against the lowest and
highest BSON elements.
• Arrays − This type is used to store arrays or list or multiple values into one key.
• Timestamp − ctimestamp. This can be handy for recording when a document
has been modified or added.
• Object − This datatype is used for embedded documents.
https://www.mongodb.com/docs/manual/reference/bson-types/
MongoDB
DataTypes
MongoDB supports many datatypes. Some of them are − (contd…)
• Null − This type is used to store a Null value.
• Symbol − This datatype is used identically to a string; however, it's
generally reserved for languages that use a specific symbol type.
• Date − This datatype is used to store the current date or time in UNIX
time format. You can specify your own date time by creating object of
Date and passing day, month, year into it.
• Object ID − This datatype is used to store the document’s ID.
• Binary data − This datatype is used to store binary data.
• Code − This datatype is used to store JavaScript code into the
document.
• Regular expression − This datatype is used to store regular expression.
https://www.mongodb.com/docs/manual/reference/bson-types/
MongoDB
DataTypes
https://www.mongodb.com/docs/manual/reference/bson-types/
MongoDB
DataTypes
The $type operator supports using these values
to query fields by their BSON type. $type also
supports the number alias, which matches the
integer, decimal, double, and long BSON types.
https://www.mongodb.com/docs/manual/reference/bson-types/
MongoDB
DataTypes https://www.prisma.io/dataguide/mongodb/mongodb-datatypes
db.types.insertMany({ _id: 1, value: 1, expectedType: 'Int32' },
{ _id: 2, value: Long("1"), expectedType: 'Long' },
{ _id: 3, value: 1.01, expectedType: 'Double' },
{ _id: 4, value: Decimal128("1.01"), expectedType: 'Decimal128' },
{ _id: 5, value: 3200000001, expectedType: 'Double' })
db.types.find({"value":{$type: "int"}})
db.types.find({"value":{$type: "long"}})
db.types.find({"value":{$type: "decimal"}})
db.types.find({"value":{$type: "double"}})
db.types.find({"value":{$type: "number"}})
db.types.find({"value":{ $type: "decimal" } } )
db.types.find({"value":{ $type: 19 }})
db.types.find({"value": 1.01})
db.types.find({"value": 1})
db.mytestcoll.find( { "first_name": { $type: "string" } } )
db.mytestcoll.find( { "first_name": { $type: 2 } } )
https://www.mongodb.com/docs/mongodb-shell/reference/data-types/#std-label-
MongoDB
DataTypes- ObjectID
ObjectId(<value>)
Returns a new ObjectId. The 12-byte ObjectId consists of:
• A 4-byte timestamp, representing the ObjectId's creation, measured in seconds
since the Unix epoch.
• A 5-byte random value generated once per process. This random value is unique
to the machine and process.
• A 3-byte incrementing counter, initialized to a random value.
For timestamp and counter values, the most significant bytes appear first in the byte
sequence (big-endian). This is unlike other BSON values, where the least significant
bytes appear first (little-endian).
If an integer value is used to create an ObjectId, the integer replaces the timestamp.
https://www.mongodb.com/docs/manual/reference/method/ObjectId/#mongodb-
MongoDB
DataTypes- ObjectID
https://www.mongodb.com/docs/manual/reference/method/ObjectId/#mongodb-
MongoDB
DataTypes- ObjectID
https://www.mongodb.com/docs/manual/reference/method/ObjectId/#mongodb-
MongoDB
DataTypes- ObjectID
ObjectId("00000020a7a0ee98923c771e")
The example ObjectId consists of:
• A four byte time stamp, 00000020
• A five byte random element, a7a0ee9892
• A three byte counter, 3c771e
The first four bytes of the ObjectId are the number of seconds since the
Unix epoch. In this example, the ObjectId timestamp is 00000020 which is 32 in
hexadecimal.
https://www.mongodb.com/docs/manual/reference/method/ObjectId/#mongodb-
MongoDB
DataTypes-
Date
Timestamp
String
Null
Undefined
….
MongoDB
DataTypes- Timestamp()
MongoDB
Hands-on
$match stage – filters those documents we need to work with, those that fit our needs
$group stage – does the aggregation job
$sort stage – sorts the resulting documents the way we require (ascending or
descending)
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/
MongoDB
Aggregation
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/
MongoDB
Aggregation
The $group stage supports certain expressions (operators)
allowing users to perform arithmetic, array, boolean and other
operations as part of the aggregation pipeline.
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/