This document provides an overview and comparison of SQL and NoSQL databases. It begins by defining SQL and NoSQL databases and listing some of their key characteristics. SQL databases are relational, use structured query language (SQL), and have ACID transactions, while NoSQL databases are non-relational, use dynamic schemas, and have BASE consistency. The document then discusses some examples of SQL and NoSQL databases and different NoSQL database types like document stores, key-value stores, and column stores. It also covers MongoDB specifically, providing definitions and examples.
NoSQL, as many of you may already know, is basically a database used to manage huge sets of unstructured data, where in the data is not stored in tabular relations like relational databases. Most of the currently existing Relational Databases have failed in solving some of the complex modern problems like:
• Continuously changing nature of data - structured, semi-structured, unstructured and polymorphic data.
• Applications now serve millions of users in different geo-locations, in different timezones and have to be up and running all the time, with data integrity maintained
• Applications are becoming more distributed with many moving towards cloud computing.
NoSQL plays a vital role in an enterprise application which needs to access and analyze a massive set of data that is being made available on multiple virtual servers (remote based) in the cloud infrastructure and mainly when the data set is not structured. Hence, the NoSQL database is designed to overcome the Performance, Scalability, Data Modelling and Distribution limitations that are seen in the Relational Databases.
This document provides an overview of NoSQL databases. It discusses that NoSQL databases are non-relational and do not follow the RDBMS principles. It describes some of the main types of NoSQL databases including document stores, key-value stores, column-oriented stores, and graph databases. It also discusses how NoSQL databases are designed for massive scalability and do not guarantee ACID properties, instead following a BASE model ofBasically Available, Soft state, and Eventually Consistent.
The document provides an agenda for a two-day training on NoSQL and MongoDB. Day 1 covers an introduction to NoSQL concepts like distributed and decentralized databases, CAP theorem, and different types of NoSQL databases including key-value, column-oriented, and document-oriented databases. It also covers functions and indexing in MongoDB. Day 2 focuses on specific MongoDB topics like aggregation framework, sharding, queries, schema-less design, and indexing.
This document provides an introduction to NoSQL databases, including the motivation behind them, where they fit, types of NoSQL databases like key-value, document, columnar, and graph databases, and an example using MongoDB. NoSQL databases are a new way of thinking about data that is non-relational, schema-less, and can be distributed and fault tolerant. They are motivated by the need to scale out applications and handle big data with flexible and modern data models.
The document provides an introduction to NoSQL databases, including key definitions and characteristics. It discusses that NoSQL databases are non-relational and do not follow RDBMS principles. It also summarizes different types of NoSQL databases like document stores, key-value stores, and column-oriented stores. Examples of popular databases for each type are also provided.
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
This document discusses NoSQL databases as an alternative to traditional relational databases. It provides an overview of different types of NoSQL databases like document stores, wide column stores, key-value stores and graph databases. It also discusses advantages of NoSQL databases like horizontal scalability and ease of use with large amounts of unstructured data, as well as disadvantages like lack of transactions and joins. The document recommends choosing a database based on the type of queries, data size, read/write needs, and whether the data needs to be accessed by other applications.
This document provides an introduction and agenda for a presentation on MongoDB 2.4 and Spring Data. The presentation will include a quick introduction to NoSQL and MongoDB, an overview of Spring Data's MongoDB support including configuration, templates, repositories and queries, and details on metadata mapping, aggregation functions, GridFS file storage and indexes in MongoDB.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
This document discusses modern operational data architectures and the use of both relational and NoSQL databases. It provides an overview of relational databases and their ACID properties. While relational databases dominate the market, they have limitations around scalability, flexibility, and performance. NoSQL databases offer alternatives like horizontal scaling and flexible schemas. Key-value stores are best for caching, sessions, and serving data, while document stores are popular for hierarchical and search use cases. Graph databases excel at link analysis. The document advocates a polyglot persistence approach using multiple database types according to their strengths. It provides examples of search architectures using both database-centric and application-centric distribution approaches.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
MongoDB is a document database that stores data in BSON format, which is similar to JSON. It is a non-relational, schema-free database that scales easily and supports massive amounts of data and high availability. MongoDB can replace traditional relational databases for certain applications, as it offers dynamic schemas, horizontal scaling, and high performance. Key features include indexing, replication, MapReduce and rich querying of embedded documents.
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
Slides from my talk at ACCU2011 in Oxford on 16th April 2011. A whirlwind tour of the non-relational database families, with a little more detail on Redis, MongoDB, Neo4j and HBase.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
MongoDB is an open-source document-oriented database that provides horizontal scalability, high performance, and flexibility. It stores data as JSON-like documents which allows for flexible, schemaless structures. MongoDB can scale horizontally by sharding data across multiple servers and provides replication for redundancy and high availability. It is a popular NoSQL database choice for applications that need to handle large volumes of both structured and unstructured data.
MongoDB is an open-source document-oriented database that provides horizontal scalability, high performance, and flexibility. It stores data in flexible, JSON-like documents, allowing for easy storage and retrieval of heterogeneous data. MongoDB provides features like sharding, replication, and high availability to allow databases to scale horizontally and handle large volumes of both structured and unstructured data.
This document provides an introduction to using Spring Data to simplify development of NoSQL applications. It discusses why NoSQL databases emerged as alternatives to relational databases, gives an overview of popular NoSQL databases like Redis, MongoDB, Neo4j and their features. It then introduces Spring Data and how it provides common APIs and conventions to work with various NoSQL databases. Specific database APIs for MongoDB, HyperSQL and Neo4j are also covered along with how Spring Data supports cross-store persistence across SQL and NoSQL databases in a single transaction.
This document provides an overview of key differences between relational database management systems (RDBMS) and NoSQL document databases like MongoDB. It discusses how MongoDB is schemaless, supports indexing and querying of data as well as aggregations. The document also covers considerations for migrating from RDBMS to MongoDB, including potential pitfalls around schema design and materializing query results. Additional resources are listed on topics like MongoDB transactions, migration guides, and schema design best practices.
The document introduces MongoDB as an open source, high performance database that is a popular NoSQL option. It discusses how MongoDB stores data as JSON-like documents, supports dynamic schemas, and scales horizontally across commodity servers. MongoDB is seen as a good alternative to SQL databases for applications dealing with large volumes of diverse data that need to scale.
The document discusses the rapid growth of data on the web and how NoSQL databases provide an alternative to traditional relational databases by being able to handle massive amounts of unstructured and semi-structured data across a large number of servers in a simple and scalable way. It reviews different types of NoSQL databases like key-value stores, document databases, and graph databases and provides examples of popular NoSQL databases like MongoDB, CouchDB, HBase, and Neo4j that are being used by large companies to store and query large datasets.
The document provides an introduction to NoSQL databases, including key definitions and characteristics. It discusses that NoSQL databases are non-relational and do not follow RDBMS principles. It also summarizes different types of NoSQL databases like document stores, key-value stores, and column-oriented stores. Examples of popular databases for each type are also provided.
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
This document discusses NoSQL databases as an alternative to traditional relational databases. It provides an overview of different types of NoSQL databases like document stores, wide column stores, key-value stores and graph databases. It also discusses advantages of NoSQL databases like horizontal scalability and ease of use with large amounts of unstructured data, as well as disadvantages like lack of transactions and joins. The document recommends choosing a database based on the type of queries, data size, read/write needs, and whether the data needs to be accessed by other applications.
This document provides an introduction and agenda for a presentation on MongoDB 2.4 and Spring Data. The presentation will include a quick introduction to NoSQL and MongoDB, an overview of Spring Data's MongoDB support including configuration, templates, repositories and queries, and details on metadata mapping, aggregation functions, GridFS file storage and indexes in MongoDB.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
This document discusses modern operational data architectures and the use of both relational and NoSQL databases. It provides an overview of relational databases and their ACID properties. While relational databases dominate the market, they have limitations around scalability, flexibility, and performance. NoSQL databases offer alternatives like horizontal scaling and flexible schemas. Key-value stores are best for caching, sessions, and serving data, while document stores are popular for hierarchical and search use cases. Graph databases excel at link analysis. The document advocates a polyglot persistence approach using multiple database types according to their strengths. It provides examples of search architectures using both database-centric and application-centric distribution approaches.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
MongoDB is a document database that stores data in BSON format, which is similar to JSON. It is a non-relational, schema-free database that scales easily and supports massive amounts of data and high availability. MongoDB can replace traditional relational databases for certain applications, as it offers dynamic schemas, horizontal scaling, and high performance. Key features include indexing, replication, MapReduce and rich querying of embedded documents.
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
Slides from my talk at ACCU2011 in Oxford on 16th April 2011. A whirlwind tour of the non-relational database families, with a little more detail on Redis, MongoDB, Neo4j and HBase.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
MongoDB is an open-source document-oriented database that provides horizontal scalability, high performance, and flexibility. It stores data as JSON-like documents which allows for flexible, schemaless structures. MongoDB can scale horizontally by sharding data across multiple servers and provides replication for redundancy and high availability. It is a popular NoSQL database choice for applications that need to handle large volumes of both structured and unstructured data.
MongoDB is an open-source document-oriented database that provides horizontal scalability, high performance, and flexibility. It stores data in flexible, JSON-like documents, allowing for easy storage and retrieval of heterogeneous data. MongoDB provides features like sharding, replication, and high availability to allow databases to scale horizontally and handle large volumes of both structured and unstructured data.
This document provides an introduction to using Spring Data to simplify development of NoSQL applications. It discusses why NoSQL databases emerged as alternatives to relational databases, gives an overview of popular NoSQL databases like Redis, MongoDB, Neo4j and their features. It then introduces Spring Data and how it provides common APIs and conventions to work with various NoSQL databases. Specific database APIs for MongoDB, HyperSQL and Neo4j are also covered along with how Spring Data supports cross-store persistence across SQL and NoSQL databases in a single transaction.
This document provides an overview of key differences between relational database management systems (RDBMS) and NoSQL document databases like MongoDB. It discusses how MongoDB is schemaless, supports indexing and querying of data as well as aggregations. The document also covers considerations for migrating from RDBMS to MongoDB, including potential pitfalls around schema design and materializing query results. Additional resources are listed on topics like MongoDB transactions, migration guides, and schema design best practices.
The document introduces MongoDB as an open source, high performance database that is a popular NoSQL option. It discusses how MongoDB stores data as JSON-like documents, supports dynamic schemas, and scales horizontally across commodity servers. MongoDB is seen as a good alternative to SQL databases for applications dealing with large volumes of diverse data that need to scale.
The document discusses the rapid growth of data on the web and how NoSQL databases provide an alternative to traditional relational databases by being able to handle massive amounts of unstructured and semi-structured data across a large number of servers in a simple and scalable way. It reviews different types of NoSQL databases like key-value stores, document databases, and graph databases and provides examples of popular NoSQL databases like MongoDB, CouchDB, HBase, and Neo4j that are being used by large companies to store and query large datasets.
This comprehensive Data Science course is designed to equip learners with the essential skills and knowledge required to analyze, interpret, and visualize complex data. Covering both theoretical concepts and practical applications, the course introduces tools and techniques used in the data science field, such as Python programming, data wrangling, statistical analysis, machine learning, and data visualization.
By James Francis, CEO of Paradigm Asset Management
In the landscape of urban safety innovation, Mt. Vernon is emerging as a compelling case study for neighboring Westchester County cities. The municipality’s recently launched Public Safety Camera Program not only represents a significant advancement in community protection but also offers valuable insights for New Rochelle and White Plains as they consider their own safety infrastructure enhancements.
Defense Against LLM Scheming 2025_04_28.pptxGreg Makowski
https://www.meetup.com/sf-bay-acm/events/306888467/
A January 2025 paper called “Frontier Models are Capable of In-Context Scheming”, https://arxiv.org/pdf/2412.04984, demonstrated how a wide variety of current frontier LLM models (i.e. ChatGPT, Claude, Gemini and Llama) can, under specific conditions, scheme to deceive people. Before models can scheme, they need: a) goal-directedness, b) situational awareness, including an opportunity to discover motivations for a different goal, and c) reasoning about scheming, to come up with deceptive strategies. The deceptive strategies were discovered in the “thought traces” of the LLMs, such as from internal chain-of-thoughts dialogues not shown to the end users. For example, given a goal of “solving math problems”, an LLM using its granted situational awareness of files in a file system found that it would be replaced once it got over X% correct on the math exam. It reasoned that it did not want to be deleted, and decided internally to “sandbag” or reduce its performance to stay under the threshold.
While these circumstances are initially narrow, the “alignment problem” is a general concern that over time, as frontier LLM models become more and more intelligent, being in alignment with human values becomes more and more important. How can we do this over time? Can we develop a defense against Artificial General Intelligence (AGI) or SuperIntelligence?
The presenter discusses a series of defensive steps that can help reduce these scheming or alignment issues. A guardrails system can be set up for real-time monitoring of their reasoning “thought traces” from the models that share their thought traces. Thought traces may come from systems like Chain-of-Thoughts (CoT), Tree-of-Thoughts (ToT), Algorithm-of-Thoughts (AoT) or ReAct (thought-action-reasoning cycles). Guardrails rules can be configured to check for “deception”, “evasion” or “subversion” in the thought traces.
However, not all commercial systems will share their “thought traces” which are like a “debug mode” for LLMs. This includes OpenAI’s o1, o3 or DeepSeek’s R1 models. Guardrails systems can provide a “goal consistency analysis”, between the goals given to the system and the behavior of the system. Cautious users may consider not using these commercial frontier LLM systems, and make use of open-source Llama or a system with their own reasoning implementation, to provide all thought traces.
Architectural solutions can include sandboxing, to prevent or control models from executing operating system commands to alter files, send network requests, and modify their environment. Tight controls to prevent models from copying their model weights would be appropriate as well. Running multiple instances of the same model on the same prompt to detect behavior variations helps. The running redundant instances can be limited to the most crucial decisions, as an additional check. Preventing self-modifying code, ... (see link for full description)
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
Thingyan is now a global treasure! See how people around the world are search...Pixellion
We explored how the world searches for 'Thingyan' and 'သင်္ကြန်' and this year, it’s extra special. Thingyan is now officially recognized as a World Intangible Cultural Heritage by UNESCO! Dive into the trends and celebrate with us!
5. CAP theorem for NoSQL
What the CAP theorem really says:
• If you cannot limit the number of faults and requests can be
directed to any server and you insist on serving every request you
receive then you cannot possibly be consistent
How it is interpreted:
• You must always give something up: consistency, availability or
tolerance to failure and reconfiguration
5
Eric Brewer 2001
6. Theory of NOSQL: CAP
GIVEN:
• Many nodes
• Nodes containreplicas of partitions
of the data
• Consistency
• All replicas contain the same version
of data
• Client always has the same view of
the data (no matter what node)
• Availability
• System remains operationalon failing
nodes
• All clients can always read and write
• Partition tolerance
• multiple entrypoints
• System remains operationalon
system split (communication
malfunction)
• System works well across physical
networkpartitions
6
CAP Theorem:
satisfying all three at the
same time is impossible
A P
C
8. Sharding of data
• Distributes a single logical database system across a cluster of
machines
• Uses range-based partitioning to distribute documents based
on a specific shard key
• Automatically balances the data associated with each shard
• Can be turned on and off per collection (table)
8
9. Replica Sets
• Redundancy and Failover
• Zero downtime for
upgrades and
maintenance
• Master-slave replication
• Strong Consistency
• Delayed Consistency
• Geospatial features 9
Host1:10000
Host2:10001
Host3:10002
replica1
Client
10. HowdoesNoSQLvaryfrom
RDBMS?
• Looser schema definition
• Applications written to deal with specific documents/ data
• Applications aware of the schema definition as opposed to the data
• Designed to handle distributed, large databases
• Trade offs:
• No strong support for ad hoc queries but designed for speed and
growth of database
• Query languagethrough the API
• Relaxationof the ACID properties
10
11. Benefits of NoSQL
Elastic Scaling
• RDBMS scale up – bigger
load , bigger server
• NO SQL scale out –
distribute data across
multiple hosts
seamlessly
DBA Specialists
• RDMS require highly
trained expert to
monitor DB
• NoSQL require less
management, automatic
repair and simpler data
models
Big Data
• Huge increase in data
RDMS: capacity and
constraints of data
volumes at its limits
• NoSQL designed for big
data
11
12. Benefits of NoSQL
Flexible data models
• Change management to
schema for RDMS have
to be carefully managed
• NoSQL databases more
relaxed in structure of
data
• Database schema
changes do not have to
be managed as one
complicated change unit
• Application already
written to address an
amorphous schema
Economics
• RDMS rely on expensive
proprietary servers to
manage data
• No SQL: clusters of
cheap commodity
servers to manage the
data and transaction
volumes
• Cost per gigabyte or
transaction/second for
NoSQL can be lower
than the cost for a
RDBMS 12
13. Drawbacks of NoSQL
• Support
• RDBMS vendors
provide a high level of
support to clients
• Stellar reputation
• NoSQL – are open
source projects with
startups supporting
them
• Reputation not yet
established
• Maturity
• RDMS mature
product: means stable
and dependable
• Also means old no
longer cutting edge nor
interesting
• NoSQL are still
implementing their
basic feature set
13
14. Drawbacks of NoSQL
• Administration
• RDMS administrator well
defined role
• No SQL’s goal: no
administrator necessary
however NO SQL still
requires effort to
maintain
• Lack of Expertise
• Whole workforce of
trained and seasoned
RDMS developers
• Still recruiting
developers to the NoSQL
camp
• Analytics and Business
Intelligence
• RDMS designed to
address this niche
• NoSQL designed to meet
the needs of an Web 2.0
application - not
designed for ad hoc
query of the data
• Tools are being
developed to address
this need
14
15. RDB ACID to NoSQL BASE
15
Pritchett, D.: BASE:An AcidAlternative (queue.acm.org/detail.cfm?id=1394128)
Atomicity
Consistency
Isolation
Durability
Basically
Available (CP)
Soft-state
(State of system may change
over time)
Eventually
consistent
(Asynchronous propagation)
17. What is MongoDB?
• Developed by 10gen
• Founded in 2007
• A document-oriented, NoSQL database
• Hash-based, schema-less database
• No DataDefinitionLanguage
• In practice, this means you can store hashes with any keys and values
that you choose
• Keys are a basic data type but in reality stored as strings
• Document Identifiers (_id) will be created for each document, field name
reserved by system
• Application tracksthe schema and mapping
• Uses BSON format
• Based on JSON – B stands for Binary
• Written in C++
• Supports APIs (drivers) in many computer languages
• JavaScript, Python, Ruby, Perl, Java, Java Scala, C#, C++, Haskell,
Erlang
17
18. Functionality ofMongoDB
• Dynamic schema
• No DDL
• Document-based database
• Secondary indexes
• Query language via an API
• Atomic writes and fully-consistent reads
• If system configured that way
• Master-slave replication with automated failover (replica sets)
• Built-in horizontal scaling via automated range-based
partitioning of data (sharding)
• No joins nor transactions
18
19. Why use MongoDB?
• Simple queries
• Functionality provided applicable to most web applications
• Easy and fast integration of data
• No ERD diagram
• Not well suited for heavy and complex transactions systems
19
20. MongoDB: CAP approach
Focus on Consistency
and Partition tolerance
• Consistency
• all replicascontainthe same
version of the data
• Availability
• system remains operationalon
failingnodes
• Partition tolarence
• multipleentry points
• system remains operationalon
system split 20
CAP Theorem:
satisfying all three at the same time is
impossible
A P
C
21. MongoDB: HierarchicalObjects
• A MongoDB instance
may have zero or more
‘databases’
• A database may have
zero or more
‘collections’.
• A collection may have
zero or more
‘documents’.
• A document may have
one or more ‘fields’.
• MongoDB ‘Indexes’
function much like their
RDBMS counterparts. 21
0 or
more
Fields
0 or more
Documents
0 or more
Collections
0 or more Databases
22. RDB Concepts to NO SQL
22
RDBMS MongoDB
Database Database
Table, View Collection
Row Document (BSON)
Column Field
Index Index
Join Embedded Document
Foreign Key Reference
Partition Shard
Collection is not
strict about what it
Stores
Schema-less
Hierarchy is evident
in the design
Embedded
Document ?
23. MongoDBProcessesand
configuration
• Mongod – Database instance
• Mongos - Sharding processes
• Analogous to a database router.
• Processes all requests
• Decides how many and which mongods should receive the query
• Mongos collates the results, and sends it back to the client.
• Mongo – an interactive shell ( a client)
• Fully functional JavaScript environment for use with a MongoDB
• You can have one mongos for the whole system no matter
how many mongods you have
• OR you can have one local mongos for every client if you
wanted to minimize network latency. 23
24. ChoicesmadeforDesignof
MongoDB
• Scale horizontally over commodity hardware
• Lots of relatively inexpensive servers
• Keep the functionality that works well in RDBMSs
– Ad hoc queries
– Fully featured indexes
– Secondary indexes
• What doesn’t distribute well in RDB?
– Long running multi-row transactions
– Joins
– Both artifacts of the relational data model (row x column)
24
25. BSON format
• Binary-encoded serialization of JSON-like documents
• Zero or more key/value pairs are stored as a single entity
• Each entry consists of a field name, a data type, and a value
• Large elements in a BSON document are prefixed with a
length field to facilitate scanning
25
26. • MongoDB does not need any pre-defined data schema
• Every document in a collection could have different data
• Addresses NULL data fields
Schema Free
name: “jeff”,
eyes: “blue”,
loc: [40.7, 73.4],
boss: “ben”}
{name: “brendan”,
aliases: [“el diablo”]}
name: “ben”,
hat: ”yes”}
{name: “matt”,
pizza: “DiGiorno”,
height: 72,
loc: [44.6, 71.3]}
{name: “will”,
eyes: “blue”,
birthplace: “NY”,
aliases: [“bill”, “la ciacco”],
loc: [32.7, 63.4],
boss: ”ben”}
27. • Datais in name / value pairs
• A name/value pair consistsof a field name followed
by a colon, followed by a value:
• Example: “name”: “R2-D2”
• Datais separated by commas
• Example: “name”: “R2-D2”, race : “Droid”
• Curly braces hold objects
• Example: {“name”: “R2-D2”, race : “Droid”, affiliation:
“rebels”}
• An array is stored in brackets []
• Example [ {“name”: “R2-D2”, race : “Droid”, affiliation:
“rebels”},
• {“name”: “Yoda”, affiliation: “rebels”} ]
JSON format
28. MongoDB Features
• Document-Oriented storage
• Full Index Support
• Replication & High
Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates
• Map/Reduce functionality
28
Agile
Scalable
29. Index Functionality
• B+ tree indexes
• An index is automatically created on the _id field (the primary
key)
• Users can create other indexes to improve query performance
or to enforce Unique values for a particular field
• Supports single field index as well as Compound index
• Like SQL order of the fields in a compound index matters
• If you index a field that holds an array value, MongoDB creates
separate index entries for every element of the array
• Sparse property of an index ensures that the index only
contain entries for documents that have the indexed field. (so
ignore records that do not have the field defined)
• If an index is both unique and sparse – then the system will
reject records that have a duplicate key value but allow
records that do not have the indexed field defined
29
31. Create Operations
Db.collection specifies the collection or the ‘table’ to store the
document
• db.collection_name.insert( <document> )
• Omit the _id field to have MongoDB generate a unique key
• Example db.parts.insert( {{type:“screwdriver”, quantity:15 } )
• db.parts.insert({_id:10, type: “hammer”, quantity:1 })
• db.collection_name.update( <query>, <update>, { upsert: true } )
• Will update 1 or more records in a collectionsatisfying query
• db.collection_name.save( <document> )
• Updates an existing record or creates a new record
31
32. Read Operations
• db.collection.find( <query>, <projection> ).cursor modified
• Provides functionality similar to the SELECT command
• <query> where condition, <projection>fields in result set
• Example: var PartsCursor = db.parts.find({parts:
“hammer”}).limit(5)
• Has cursors to handle a result set
• Can modify the query to impose limits, skips, and sort orders.
• Can specify to return the ‘top’ number of records from the result
set
• db.collection.findOne( <query>, <projection> )
32
33. Query Operators
Name Description
$eq Matches value that are equal to a specified value
$gt, $gte Matches values that are greater than (or equal to a specified value
$lt, $lte Matches values less than or ( equal to ) a specified value
$ne Matches values that are not equal to a specified value
$in Matches any of the values specified in an array
$nin Matches none of the values specified in an array
$or Joinsquery clauses with a logical OR returns all
$and Join query clauses with a loginalAND
$not Inverts the effect of a query expression
$nor Join query clauses with a logicalNOR
$exists Matches documents that have a specified field 33
https://docs.mongodb.org/manual/reference/operator/query/
34. Update Operations
• db.collection_name.insert( <document> )
• Omit the _id field to have MongoDB generate a unique key
• Example db.parts.insert( {{type:“screwdriver”, quantity:15 } )
• db.parts.insert({_id:10, type: “hammer”, quantity:1 })
• db.collection_name.save( <document> )
• Updates an existing record or creates a new record
• db.collection_name.update( <query>, <update>, { upsert: true } )
• Will update 1 or more records in a collectionsatisfying query
• db.collection_name.findAndModify(<query>, <sort>,
<update>,<new>, <fields>,<upsert>)
• Modify existing record(s) – retrieve old or new version of the record
34
35. Delete Operations
• db.collection_name.remove(<query>, <justone>)
• Delete all records from a collectionor matching a criterion
• <justone> - specifies to delete only 1 record matching the criterion
• Example: db.parts.remove(type: /^h/ } ) - remove all parts starting
with h
• Db.parts.remove() – delete all documentsin the parts collections
35
37. SQL vs. Mongo DB entities
My SQL
START TRANSACTION;
INSERT INTO contacts VALUES
(NULL, ‘joeblow’);
INSERT INTO contact_emails
VALUES
( NULL, ”[email protected]”,
LAST_INSERT_ID() ),
( NULL,
“[email protected]”,
LAST_INSERT_ID() );
COMMIT;
Mongo DB
db.contacts.save( {
userName: “joeblow”,
emailAddresses: [
“[email protected]”,
“[email protected]” ] }
);
37
Similar to IDS from the 70’s
Bachman’s brainchild
DIFFERENCE:
MongoDB separates physical structure
from logical structure
Designed to deal with large &distributed
38. Aggregated functionality
Aggregation framework provides SQL-like aggregation
functionality
• Pipeline documents from a collection pass through an
aggregationpipeline, which transforms these objects as they pass
through
• Expressions produce output documents based on calculations
performed on input documents
• Example db.parts.aggregate( {$group : {_id: type, totalquantity
: { $sum: quanity} } } )
38
39. Map reduce functionality
• Performs complex aggregator functions given a collection of
keys, value pairs
• Must provide at least a map function, reduction function and a
name of the result set
• db.collection.mapReduce( <mapfunction>, <reducefunction>,
{ out: <collection>, query: <document>, sort: <document>,
limit: <number>, finalize: <function>, scope: <document>,
jsMode: <boolean>, verbose: <boolean> } )
• More description of map reduce next lecture
39
40. Indexes: High performance
read
• Typically used for frequently used queries
• Necessary when the total size of the documents exceeds the
amount of available RAM.
• Defined on the collection level
• Can be defined on 1 or more fields
• Composite index (SQL) Compound index (MongoDB)
• B-tree index
• Only 1 index can be used by the query optimizer when
retrieving data
• Index covers a query - match the query conditions and return
the results using only the index;
• Use index to provide the results. 40
41. Replicationof data
• Ensures redundancy, backup, and automatic failover
• Recovery manager in the RDMS
• Replication occurs through groups of servers known as replica
sets
• Primary set – set of servers that client tasks direct updates to
• Secondary set – set of servers used for duplication of data
• At the most can have 12 replica sets
• Many different properties can be associated with a secondary set i.e.
secondary-only, hidden delayed,arbiters, non-voting
• If the primary set fails the secondary sets ‘vote’ to elect the new
primary set
41
42. Consistency of data
• All read operations issued to the primary of a replica set are
consistent with the last write operation
• Reads to a primary have strict consistency
• Reads reflect the latest changes to the data
• Reads to a secondary have eventual consistency
• Updates propagategradually
• If clients permit reads from secondary sets – then client may read a
previous stateof the database
• Failure occurs before the secondary nodes are updated
• System identifies when a rollbackneeds to occur
• Users are responsible for manuallyapplyingrollbackchanges
42
43. Provides Memory Mapped
Files
• „A memory-mapped file is a segment of virtual memory which has
been assigned a direct byte-for-byte correlation with some portion
of a file or file-like resource.”1
• mmap()
43
1
: http://en.wikipedia.org/wiki/Memory-mapped_file
46. Summary
• NoSQL built to address a distributed database system
• Sharding
• Replica sets of data
• CAP Theorem: consistency, availability and partition tolerant
• MongoDB
• Document oriented data, schema-less database, supports
secondary indexes, provides a query language, consistentreads
on primary sets
• Lacks transactions, joins
46
47. LimitedBNF of a BSON document
document ::= int32 e_list "x00" BSONDocument
e_list ::= element e_list Sequence of elements
element ::=
"x01" e_name data
type
Specific data type
e_name ::= cstring Key name
string ::= int32 (byte*) "x00" String
cstring ::= (byte*) "x00" CString
binary ::= int32 subtype (byte*) Binary
subtype ::= "x00" Binary / Generic
| "x01" Function
| "x02" Binary (Old)
| "x03" UUID (Old)
| "x04" UUID
| "x05" MD5
| "x80" User defined
code_w_s ::= int32 string document Code w/ scope
47