SlideShare a Scribd company logo
Document Databases in NoSQL
- stores data in JSON, BSON, or XML documents. Follows Tree Structure
- move documents under one document, any particular elements can be indexed to run
queries faster.
- close to the data objects which are used in many applications which means very less
translations are required to use data in applications.
- JSON is a native language that is often used to store and query data too. So in the
document data model, each document has a key-value pair.
- works well with use cases such as catalogs, user profiles, and content management --
enable flexible indexing, powerful ad hoc queries, and analytics over collections of
documents.
Eg. Application
Catalogs
For example, in an e-commerce application,
products have different numbers of attributes.
Managing thousands of attributes in relational databases is inefficient, and the
reading performance is affected.
Using a document database, each product’s attributes can be described in a
single document for easy management and faster reading speed. Changing
the attributes of one product won’t affect others.
Module 2.3 Document Databases in NoSQL Systems
Features:
● Document Type Model: data is stored in documents rather than tables or
graphs, so it becomes easy to map things in many programming languages.
● Flexible Schema: not all documents in a collection need to have the same
fields.
● Distributed and Resilient: Document data models are very much dispersed
which is the reason behind horizontal scaling and distribution of data.
● Manageable Query Language: CRUD (Create Read Update Destroy)
operations on the data model.
Examples of Document Data Models :
● Amazon DocumentDB
● MongoDB (Document databases, such as CouchDB and MongoDB store entire documents in the form of
JSON objects. You can think of these objects as nested key-value pairs.)
● Cosmos DB
● ArangoDB
● Couchbase Server
● CouchDB
Advantages:
● Schema-less: no restrictions in the format and the structure of data storage.
● Faster creation of document and maintenance: simple to create a
document and apart from this maintenance requires is almost nothing.
● Open formats: simple build process that uses XML, JSON, and its other
forms.
● Built-in versioning: It has built-in versioning which means as the documents
grow in size there might be a chance they can grow in complexity. Versioning
decreases conflicts.
Disadvantages:
● Weak Atomicity: It lacks in supporting multi-document ACID transactions. A
change in the document data model involving two collections will require us to
run two separate queries i.e. one for each collection. This is where it breaks
atomicity requirements.
● Consistency Check Limitations: One can search the collections and
documents that are not connected to an author collection but doing this might
create a problem in the performance of database performance.
● Security: Nowadays many web applications lack security which in turn results
in the leakage of sensitive data. So it becomes a point of concern, one must
pay attention to web app vulnerabilities.
Applications of Document Data Model :
● Content Management: These data models are very much used in creating various
video streaming platforms, blogs, and similar services Because each is stored as a
single document and the database here is much easier to maintain as the service
evolves over time.
● Book Database: These are very much useful in making book databases because
as we know this data model lets us nest.
● Catalog: When it comes to storing and reading catalog files these data models are
very much used because it has a fast reading ability if incase Catalogs have
thousands of attributes stored.
● Analytics Platform: These data models are very much used in the Analytics
Platform.
Module 2.3 Document Databases in NoSQL Systems
Mongo DB sample operations.
● First N / Last N Rows
● Equal to / Not Equal to
● Equal / Not Equal to One of Them — IN
● Multiple Conditions with AND / OR
● Regular Expression — Text Matching — Contains, Starts / Ends With
● Is NA (Null) or Not NA
● Filter with Numeric Columns
● Filter with Date/Time Columns
● Filter with Array Columns
● Filter with ‘Array of Documents’ Columns
Why Graph database?
Graph Database : the relationships between data are prioritized.
data is used where one data is connected directly or indirectly.
join operations are not required in this database which reduces the cost.
The relationships and properties are stored as first-class entities in Graph Database.
Allow organizations to connect the data with external sources as well.
if the organization wants to find a particular data that is connected with another data in another
table, so first join operation is performed between the tables, and then search for the data is done
row by row.
Graph database solves this big problem. They store the relationships and properties along with
the data. So if the organization needs to search for a particular data, then with the help of
relationships and properties the nodes can be found without joining or without traversing row by
row. Thus the searching of nodes is not dependent on the amount of data.
● Nodes: represent the objects or instances. They are equivalent to a row in
database. The node basically acts as a vertex in a graph. The nodes are
grouped by applying a label to each member.
● Relationships: They are basically the edges in the graph. They have a
specific direction, type and form patterns of the data. They basically establish
relationship between nodes.
● Properties: They are the information associated with the nodes.
Eg. Neo4j, Oracle NoSQL DB, Graph base
Types of Graph Databases:
● Property Graphs: These graphs are used for querying and analyzing data by
modelling the relationships among the data. It comprises of vertices that has
information about the particular subject and edges that denote the relationship.
The vertices and edges have additional attributes called properties.
● RDF Graphs: It stands for Resource Description Framework. It focuses more
on data integration. They are used to represent complex data with well defined
semantics. It is represented by three elements: two vertices, an edge that reflect
the subject, predicate and object of a sentence. Every vertex and edge is
represented by URI(Uniform Resource Identifier).
When to Use Graph Database?
● Graph databases should be used for heavily interconnected data.
● It should be used when amount of data is larger and relationships are present.
● It can be used to represent the cohesive picture of the data.
Module 2.3 Document Databases in NoSQL Systems
Module 2.3 Document Databases in NoSQL Systems
How Graph and Graph Databases Work?
Graph databases provide graph models They allow users to
perform traversal queries since data is connected. Graph
algorithms are also applied to find patterns, paths and other
relationships this enabling more analysis of the data. The
algorithms help to explore the neighboring nodes, clustering
of vertices analyze relationships and patterns. Countless joins
are not required in this kind of database.
Example of Graph Database:
● Recommendation engines in E commerce use graph databases to provide customers with
accurate recommendations, updates about new products thus increasing sales and satisfying the
customer’s desires.
● Social media companies use graph databases to find the “friends of friends” or products that the
user’s friends like and send suggestions accordingly to user.
● To detect fraud Graph databases play a major role. Users can create graph from the transactions
between entities and store other important information. Once created, running a simple query will
help to identify the fraud.
Advantages of Graph Database:
● Potential advantage of Graph Database is establishing the relationships with external sources as
well
● No joins are required since relationships is already specified.
● Query is dependent on concrete relationships and not on the amount of data.
● It is flexible and agile.
● it is easy to manage the data in terms of graph.
Module 2.3 Document Databases in NoSQL Systems
Neo4j : data model
● Person nodes, which have the following properties: name and born.
● Movie nodes, which have the following properties: title, released, and tagline.
● The data model also contains five different relationship types between
the Person and Movie nodes: ACTED_IN, DIRECTED, PRODUCED, WROTE
, and REVIEWED. Two of the relationship types have properties:
● The ACTED_IN relationship type, which has the roles property.
● The REVIEWED relationship type, which has a summary property and
a rating property.
Sample query statements in Cypher query language
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix)
● Finding nodes
find the nodes with Person label and the name Keanu Reeves, and return
the name and born properties of the found nodes
MATCH (keanu:Person {name:'Keanu Reeves'})
RETURN keanu.name AS name, keanu.born AS born
Table 1. Result
name born
"Keanu Reeves" 1964
MATCH (tom:Person
{name:'Tom Hanks'})-[r]-
>(m:Movie)
RETURN type(r) AS type,
m.title AS movie
To be a good candidate for a general class of big data
problems, NoSQL solutions should
 Be efficient with input and output and scale linearly with growing data size.
 Be operationally efficient. Organizations can’t afford to hire many people to run
the servers.
 Require that reports and analyses be performed by nonprogrammers using
simple tools—not every business can afford a full-time Java programmer to write
on-demand queries.
 Meet the challenges of distributed computing, including consideration of latency
between systems and eventual node failures.
 Meet both the needs of overnight batch processing economy-of-scale and time
critical event processing
● For graph traversals to be fast, the entire graph should be in main memory.
This is why graph stores work most efficiently when you have enough RAM to
hold the graph. If you can’t keep your graph in RAM, graph stores will try to
swap the data to disk, which will decrease graph query performance by a
factor of 1,000.
● The only way to combat the problem is to move to a shared-memory
architecture, where multiple threads all access a large RAM structure without
the graph data moving outside of the shared RAM
Analyzing big data with a shared-nothing architecture
There are three ways that resources can be shared between computer systems:
1. shared RAM,
2. shared disk, and
3. shared-nothing
Choosing distribution models: master-slave versus peer-to-peer
● The disadvantage of peer-to peer networks is that there’s an increased
complexity and communication overhead that must occur for all nodes to be
kept up to date with the cluster status.
● Using the right distribution model will depend on your business requirements:
if high availability is a concern, a peer-to-peer network might be the best
solution. If you can manage your big data using batch jobs that run in off
hours, then the simpler master-slave model might be best.
Module 2.3 Document Databases in NoSQL Systems
Variations of NoSQL Architectural Patterns
● Database architecture : distributed (manages single database distributed in multiple servers
located at various sites) or federated (manages independent and heterogeneous databases at
multiple sites).
● In Internet of Things (IoT) architecture a virtual sensor has to integrate multiple data streams from
real sensors into a single data stream.
Results : stored temporarily or stored permanently
Uses data and sources-centric IoT middleware.
● Scalable architectural patterns can be used to form new scalable architectures. For example, a
combination of load balancer and shared-nothing architecture; distributed Hash Table and Content
Addressable network (Cassandra); Publish/Subscribe (EventJava); etc.
Variations of NoSQL Architectural Patterns
● The variations in architecture are based on system requirements like agility,
availability (any-where, anytime), intelligence, scalability, collaboration and
low latency.
● Agility is given as a service using virtualization or cloud computing;
● availability is the service given by internet and mobility;
● intelligence is given by machine learning and predictive analytics;
● scalability (flexibility of using commodity machines) is given by Big Data
Technologies/cloud platforms;
● collaboration is given by (enterprise-wide) social network application; and
● low latency (event driven) is provided by in- memory databases.
Analyzing Big Data with a Shared Nothing Architecture
● distributed computing architecture : resource sharing possible or share
nothing.
● shared RAM, shared disk, and shared-nothing.
● In shared RAM, many CPUs access a single shared RAM over a high-speed bus.
This system is ideal for large computation and also for graph stores. For graph
traversals to be fast, the entire graph should be in main memory.
● The shared disk system, processors have independent RAM but shares disk space
using a storage area network (SAN). Big data uses commodity machines which
shares nothing (shares no resources).
● (key−value store and document store) are cache-friendly.
● Row stores and graph stores are not cache-friendly
Choosing Distribution Models
● There are two styles of distributing data: Sharding and replication.
● Riak database shards the data and also replicates it.
● Sharding: horizontal partitioning
● Replication: Master−slave replication, Peer-to-peer replication.
● The right distribution model is chosen based on the business requirement. For
example, for batch processing jobs, master-slave is chosen and for high
availability jobs, peer-to-peer distribution model is chosen.
● Hadoop’s initial version has master−slave architecture
● Later versions removed SPOF. HBase has a master−slave model, while
Cassandra has a peer-to-peer model.
When to use which store?
● Key – Value : constant stream of small R-W
● Document DB : natural data modelling, Programmer friendly , CRUD.
● Columnar : Massive write load, High availability, Map Reduce.
● Graph : Graph algorithms and relations
Ad

More Related Content

Similar to Module 2.3 Document Databases in NoSQL Systems (20)

Nosql
NosqlNosql
Nosql
Roxana Tadayon
 
Database
DatabaseDatabase
Database
Hossein Mobasher
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdf
l235546
 
Analysis on NoSQL: MongoDB Tool
Analysis on NoSQL: MongoDB ToolAnalysis on NoSQL: MongoDB Tool
Analysis on NoSQL: MongoDB Tool
ijtsrd
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
Tariqul islam
 
Bloc Basic DBMS in detailed information
Bloc  Basic DBMS in detailed informationBloc  Basic DBMS in detailed information
Bloc Basic DBMS in detailed information
parthbarach2005
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
Imteyaz Khan
 
Distilled Module1_Chapter2_NoSQL.pptx
Distilled    Module1_Chapter2_NoSQL.pptxDistilled    Module1_Chapter2_NoSQL.pptx
Distilled Module1_Chapter2_NoSQL.pptx
vivekananda34
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
Max Neunhöffer
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
DBMS unit 1.pptx
DBMS unit 1.pptxDBMS unit 1.pptx
DBMS unit 1.pptx
ssuserc8e1481
 
Graph Database and Why it is gaining traction
Graph Database and Why it is gaining tractionGraph Database and Why it is gaining traction
Graph Database and Why it is gaining traction
Giridhar Chandrasekaran
 
data base system to new data science lerne
data base system to new data science lernedata base system to new data science lerne
data base system to new data science lerne
tarunprajapati0t
 
WEB_DATABASE_chapter_4.pptx
WEB_DATABASE_chapter_4.pptxWEB_DATABASE_chapter_4.pptx
WEB_DATABASE_chapter_4.pptx
Koteswari Kasireddy
 
NoSql
NoSqlNoSql
NoSql
AnitaSenthilkumar
 
moving_from_relational_to_nosql_couchbase_2016
moving_from_relational_to_nosql_couchbase_2016moving_from_relational_to_nosql_couchbase_2016
moving_from_relational_to_nosql_couchbase_2016
Richard (Rick) Nelson
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
benosteen
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptx
RRamyaDevi
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdf
l235546
 
Analysis on NoSQL: MongoDB Tool
Analysis on NoSQL: MongoDB ToolAnalysis on NoSQL: MongoDB Tool
Analysis on NoSQL: MongoDB Tool
ijtsrd
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
Tariqul islam
 
Bloc Basic DBMS in detailed information
Bloc  Basic DBMS in detailed informationBloc  Basic DBMS in detailed information
Bloc Basic DBMS in detailed information
parthbarach2005
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
Imteyaz Khan
 
Distilled Module1_Chapter2_NoSQL.pptx
Distilled    Module1_Chapter2_NoSQL.pptxDistilled    Module1_Chapter2_NoSQL.pptx
Distilled Module1_Chapter2_NoSQL.pptx
vivekananda34
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
Graph Database and Why it is gaining traction
Graph Database and Why it is gaining tractionGraph Database and Why it is gaining traction
Graph Database and Why it is gaining traction
Giridhar Chandrasekaran
 
data base system to new data science lerne
data base system to new data science lernedata base system to new data science lerne
data base system to new data science lerne
tarunprajapati0t
 
moving_from_relational_to_nosql_couchbase_2016
moving_from_relational_to_nosql_couchbase_2016moving_from_relational_to_nosql_couchbase_2016
moving_from_relational_to_nosql_couchbase_2016
Richard (Rick) Nelson
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
benosteen
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptx
RRamyaDevi
 

More from NiramayKolalle (6)

Module 1.2: Data Warehousing Fundamentals.pptx
Module 1.2:  Data Warehousing Fundamentals.pptxModule 1.2:  Data Warehousing Fundamentals.pptx
Module 1.2: Data Warehousing Fundamentals.pptx
NiramayKolalle
 
MODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptxMODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
Module 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptxModule 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Web mining .pdf module 6 dwm third year ce
Web mining .pdf module 6 dwm third year ceWeb mining .pdf module 6 dwm third year ce
Web mining .pdf module 6 dwm third year ce
NiramayKolalle
 
SPCC_Sem6_Chapter 6_Code Optimization part
SPCC_Sem6_Chapter 6_Code Optimization partSPCC_Sem6_Chapter 6_Code Optimization part
SPCC_Sem6_Chapter 6_Code Optimization part
NiramayKolalle
 
Module 6 Intermediate Code Generation.pdf
Module 6 Intermediate Code Generation.pdfModule 6 Intermediate Code Generation.pdf
Module 6 Intermediate Code Generation.pdf
NiramayKolalle
 
Module 1.2: Data Warehousing Fundamentals.pptx
Module 1.2:  Data Warehousing Fundamentals.pptxModule 1.2:  Data Warehousing Fundamentals.pptx
Module 1.2: Data Warehousing Fundamentals.pptx
NiramayKolalle
 
MODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptxMODULE 1: Introduction to Big Data Analytics.pptx
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
Module 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptxModule 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Web mining .pdf module 6 dwm third year ce
Web mining .pdf module 6 dwm third year ceWeb mining .pdf module 6 dwm third year ce
Web mining .pdf module 6 dwm third year ce
NiramayKolalle
 
SPCC_Sem6_Chapter 6_Code Optimization part
SPCC_Sem6_Chapter 6_Code Optimization partSPCC_Sem6_Chapter 6_Code Optimization part
SPCC_Sem6_Chapter 6_Code Optimization part
NiramayKolalle
 
Module 6 Intermediate Code Generation.pdf
Module 6 Intermediate Code Generation.pdfModule 6 Intermediate Code Generation.pdf
Module 6 Intermediate Code Generation.pdf
NiramayKolalle
 
Ad

Recently uploaded (20)

Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Ad

Module 2.3 Document Databases in NoSQL Systems

  • 1. Document Databases in NoSQL - stores data in JSON, BSON, or XML documents. Follows Tree Structure - move documents under one document, any particular elements can be indexed to run queries faster. - close to the data objects which are used in many applications which means very less translations are required to use data in applications. - JSON is a native language that is often used to store and query data too. So in the document data model, each document has a key-value pair. - works well with use cases such as catalogs, user profiles, and content management -- enable flexible indexing, powerful ad hoc queries, and analytics over collections of documents.
  • 2. Eg. Application Catalogs For example, in an e-commerce application, products have different numbers of attributes. Managing thousands of attributes in relational databases is inefficient, and the reading performance is affected. Using a document database, each product’s attributes can be described in a single document for easy management and faster reading speed. Changing the attributes of one product won’t affect others.
  • 4. Features: ● Document Type Model: data is stored in documents rather than tables or graphs, so it becomes easy to map things in many programming languages. ● Flexible Schema: not all documents in a collection need to have the same fields. ● Distributed and Resilient: Document data models are very much dispersed which is the reason behind horizontal scaling and distribution of data. ● Manageable Query Language: CRUD (Create Read Update Destroy) operations on the data model.
  • 5. Examples of Document Data Models : ● Amazon DocumentDB ● MongoDB (Document databases, such as CouchDB and MongoDB store entire documents in the form of JSON objects. You can think of these objects as nested key-value pairs.) ● Cosmos DB ● ArangoDB ● Couchbase Server ● CouchDB
  • 6. Advantages: ● Schema-less: no restrictions in the format and the structure of data storage. ● Faster creation of document and maintenance: simple to create a document and apart from this maintenance requires is almost nothing. ● Open formats: simple build process that uses XML, JSON, and its other forms. ● Built-in versioning: It has built-in versioning which means as the documents grow in size there might be a chance they can grow in complexity. Versioning decreases conflicts.
  • 7. Disadvantages: ● Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change in the document data model involving two collections will require us to run two separate queries i.e. one for each collection. This is where it breaks atomicity requirements. ● Consistency Check Limitations: One can search the collections and documents that are not connected to an author collection but doing this might create a problem in the performance of database performance. ● Security: Nowadays many web applications lack security which in turn results in the leakage of sensitive data. So it becomes a point of concern, one must pay attention to web app vulnerabilities.
  • 8. Applications of Document Data Model : ● Content Management: These data models are very much used in creating various video streaming platforms, blogs, and similar services Because each is stored as a single document and the database here is much easier to maintain as the service evolves over time. ● Book Database: These are very much useful in making book databases because as we know this data model lets us nest. ● Catalog: When it comes to storing and reading catalog files these data models are very much used because it has a fast reading ability if incase Catalogs have thousands of attributes stored. ● Analytics Platform: These data models are very much used in the Analytics Platform.
  • 10. Mongo DB sample operations. ● First N / Last N Rows ● Equal to / Not Equal to ● Equal / Not Equal to One of Them — IN ● Multiple Conditions with AND / OR ● Regular Expression — Text Matching — Contains, Starts / Ends With ● Is NA (Null) or Not NA ● Filter with Numeric Columns ● Filter with Date/Time Columns ● Filter with Array Columns ● Filter with ‘Array of Documents’ Columns
  • 11. Why Graph database? Graph Database : the relationships between data are prioritized. data is used where one data is connected directly or indirectly. join operations are not required in this database which reduces the cost. The relationships and properties are stored as first-class entities in Graph Database. Allow organizations to connect the data with external sources as well. if the organization wants to find a particular data that is connected with another data in another table, so first join operation is performed between the tables, and then search for the data is done row by row. Graph database solves this big problem. They store the relationships and properties along with the data. So if the organization needs to search for a particular data, then with the help of relationships and properties the nodes can be found without joining or without traversing row by row. Thus the searching of nodes is not dependent on the amount of data.
  • 12. ● Nodes: represent the objects or instances. They are equivalent to a row in database. The node basically acts as a vertex in a graph. The nodes are grouped by applying a label to each member. ● Relationships: They are basically the edges in the graph. They have a specific direction, type and form patterns of the data. They basically establish relationship between nodes. ● Properties: They are the information associated with the nodes. Eg. Neo4j, Oracle NoSQL DB, Graph base
  • 13. Types of Graph Databases: ● Property Graphs: These graphs are used for querying and analyzing data by modelling the relationships among the data. It comprises of vertices that has information about the particular subject and edges that denote the relationship. The vertices and edges have additional attributes called properties. ● RDF Graphs: It stands for Resource Description Framework. It focuses more on data integration. They are used to represent complex data with well defined semantics. It is represented by three elements: two vertices, an edge that reflect the subject, predicate and object of a sentence. Every vertex and edge is represented by URI(Uniform Resource Identifier).
  • 14. When to Use Graph Database? ● Graph databases should be used for heavily interconnected data. ● It should be used when amount of data is larger and relationships are present. ● It can be used to represent the cohesive picture of the data.
  • 17. How Graph and Graph Databases Work? Graph databases provide graph models They allow users to perform traversal queries since data is connected. Graph algorithms are also applied to find patterns, paths and other relationships this enabling more analysis of the data. The algorithms help to explore the neighboring nodes, clustering of vertices analyze relationships and patterns. Countless joins are not required in this kind of database.
  • 18. Example of Graph Database: ● Recommendation engines in E commerce use graph databases to provide customers with accurate recommendations, updates about new products thus increasing sales and satisfying the customer’s desires. ● Social media companies use graph databases to find the “friends of friends” or products that the user’s friends like and send suggestions accordingly to user. ● To detect fraud Graph databases play a major role. Users can create graph from the transactions between entities and store other important information. Once created, running a simple query will help to identify the fraud. Advantages of Graph Database: ● Potential advantage of Graph Database is establishing the relationships with external sources as well ● No joins are required since relationships is already specified. ● Query is dependent on concrete relationships and not on the amount of data. ● It is flexible and agile. ● it is easy to manage the data in terms of graph.
  • 20. Neo4j : data model ● Person nodes, which have the following properties: name and born. ● Movie nodes, which have the following properties: title, released, and tagline.
  • 21. ● The data model also contains five different relationship types between the Person and Movie nodes: ACTED_IN, DIRECTED, PRODUCED, WROTE , and REVIEWED. Two of the relationship types have properties: ● The ACTED_IN relationship type, which has the roles property. ● The REVIEWED relationship type, which has a summary property and a rating property.
  • 22. Sample query statements in Cypher query language CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'}) CREATE (Keanu:Person {name:'Keanu Reeves', born:1964}) CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967}) CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961}) CREATE (Hugo:Person {name:'Hugo Weaving', born:1960}) CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967}) CREATE (LanaW:Person {name:'Lana Wachowski', born:1965}) CREATE (JoelS:Person {name:'Joel Silver', born:1952}) CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix), (Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix), (Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix), (Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix), (LillyW)-[:DIRECTED]->(TheMatrix), (LanaW)-[:DIRECTED]->(TheMatrix), (JoelS)-[:PRODUCED]->(TheMatrix)
  • 23. ● Finding nodes find the nodes with Person label and the name Keanu Reeves, and return the name and born properties of the found nodes MATCH (keanu:Person {name:'Keanu Reeves'}) RETURN keanu.name AS name, keanu.born AS born Table 1. Result name born "Keanu Reeves" 1964
  • 25. To be a good candidate for a general class of big data problems, NoSQL solutions should  Be efficient with input and output and scale linearly with growing data size.  Be operationally efficient. Organizations can’t afford to hire many people to run the servers.  Require that reports and analyses be performed by nonprogrammers using simple tools—not every business can afford a full-time Java programmer to write on-demand queries.  Meet the challenges of distributed computing, including consideration of latency between systems and eventual node failures.  Meet both the needs of overnight batch processing economy-of-scale and time critical event processing
  • 26. ● For graph traversals to be fast, the entire graph should be in main memory. This is why graph stores work most efficiently when you have enough RAM to hold the graph. If you can’t keep your graph in RAM, graph stores will try to swap the data to disk, which will decrease graph query performance by a factor of 1,000. ● The only way to combat the problem is to move to a shared-memory architecture, where multiple threads all access a large RAM structure without the graph data moving outside of the shared RAM
  • 27. Analyzing big data with a shared-nothing architecture There are three ways that resources can be shared between computer systems: 1. shared RAM, 2. shared disk, and 3. shared-nothing
  • 28. Choosing distribution models: master-slave versus peer-to-peer
  • 29. ● The disadvantage of peer-to peer networks is that there’s an increased complexity and communication overhead that must occur for all nodes to be kept up to date with the cluster status. ● Using the right distribution model will depend on your business requirements: if high availability is a concern, a peer-to-peer network might be the best solution. If you can manage your big data using batch jobs that run in off hours, then the simpler master-slave model might be best.
  • 31. Variations of NoSQL Architectural Patterns ● Database architecture : distributed (manages single database distributed in multiple servers located at various sites) or federated (manages independent and heterogeneous databases at multiple sites). ● In Internet of Things (IoT) architecture a virtual sensor has to integrate multiple data streams from real sensors into a single data stream. Results : stored temporarily or stored permanently Uses data and sources-centric IoT middleware. ● Scalable architectural patterns can be used to form new scalable architectures. For example, a combination of load balancer and shared-nothing architecture; distributed Hash Table and Content Addressable network (Cassandra); Publish/Subscribe (EventJava); etc.
  • 32. Variations of NoSQL Architectural Patterns ● The variations in architecture are based on system requirements like agility, availability (any-where, anytime), intelligence, scalability, collaboration and low latency. ● Agility is given as a service using virtualization or cloud computing; ● availability is the service given by internet and mobility; ● intelligence is given by machine learning and predictive analytics; ● scalability (flexibility of using commodity machines) is given by Big Data Technologies/cloud platforms; ● collaboration is given by (enterprise-wide) social network application; and ● low latency (event driven) is provided by in- memory databases.
  • 33. Analyzing Big Data with a Shared Nothing Architecture ● distributed computing architecture : resource sharing possible or share nothing. ● shared RAM, shared disk, and shared-nothing. ● In shared RAM, many CPUs access a single shared RAM over a high-speed bus. This system is ideal for large computation and also for graph stores. For graph traversals to be fast, the entire graph should be in main memory. ● The shared disk system, processors have independent RAM but shares disk space using a storage area network (SAN). Big data uses commodity machines which shares nothing (shares no resources). ● (key−value store and document store) are cache-friendly. ● Row stores and graph stores are not cache-friendly
  • 34. Choosing Distribution Models ● There are two styles of distributing data: Sharding and replication. ● Riak database shards the data and also replicates it. ● Sharding: horizontal partitioning ● Replication: Master−slave replication, Peer-to-peer replication. ● The right distribution model is chosen based on the business requirement. For example, for batch processing jobs, master-slave is chosen and for high availability jobs, peer-to-peer distribution model is chosen. ● Hadoop’s initial version has master−slave architecture ● Later versions removed SPOF. HBase has a master−slave model, while Cassandra has a peer-to-peer model.
  • 35. When to use which store? ● Key – Value : constant stream of small R-W ● Document DB : natural data modelling, Programmer friendly , CRUD. ● Columnar : Massive write load, High availability, Map Reduce. ● Graph : Graph algorithms and relations