SlideShare a Scribd company logo
Apache Cassandra Sample
Material
VS-1046
1. INTRODUCTION TO NOSQL
NoSQL databases try to offer certain functionality that more traditional relational database
management systems do not. Whether it is for holding simple key-value pairs for shorter
lengths of time for caching purposes, or keeping unstructured collections (e.g. collections)
of data that could not be easily dealt with using relational databases and the structured
query language (SQL) – they are here to help.
1.1. NoSQL Basics
A NoSQL (originally referring to "non SQL", "non relational" or "not only SQL") database
provides a mechanism for storage and retrieval of data which is modeled in means other
than the tabular relations used in relational databases. Such databases have existed since
the late 1960s, but did not obtain the "NoSQL" moniker until a surge of popularity in the
early twenty-first century, triggered by the needs of Web 2.0 companies such as Facebook,
Google, and Amazon.com. NoSQL databases are increasingly used in big data and real-
time web applications. NoSQL systems are also sometimes called "Not only SQL" to
emphasize that they may support SQL-like query languages.
Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to
clusters of machines (which is a problem for relational databases), and finer control over
availability. The data structures used by NoSQL databases (e.g. key-value, wide column,
graph, or document) are different from those used by default in relational databases,
making some operations faster in NoSQL. The particular suitability of a given NoSQL
database depends on the problem it must solve. Sometimes the data structures used by
NoSQL databases are also viewed as "more flexible" than relational database tables.
Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor
of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL
stores include the use of low-level query languages (instead of SQL, for instance the lack of
ability to perform ad-hoc joins across tables), lack of standardized interfaces, and huge
previous investments in existing relational databases.] Most NoSQL stores lack true ACID
transactions, although a few databases, such as MarkLogic, Aerospike, FairCom c-treeACE,
Google Spanner (though technically a NewSQL database), Symas LMDB, and OrientDB
have made them central to their designs.
Instead, most NoSQL databases offer a concept of "eventual consistency" in which database
changes are propagated to all nodes "eventually" (typically within milliseconds) so queries
for data might not return updated data immediately or might result in reading data that is
not accurate, a problem known as stale reads. Additionally, some NoSQL systems may
exhibit lost writes and other forms of data loss. Fortunately, some NoSQL systems provide
concepts such as write-ahead logging to avoid data loss. For distributed transaction
processing across multiple databases, data consistency is an even bigger challenge that is
difficult for both NoSQL and relational databases. Even current relational databases "do
not allow referential integrity constraints to span databases." There are few systems that
maintain both ACID transactions and X/Open XA standards for distributed transaction
processing.
Types and examples of NoSQL databases
There have been various approaches to classify NoSQL databases, each with different
categories and subcategories, some of which overlap. What follows is a basic classification
by data model, with examples:
Column: Accumulo, Cassandra, Druid, HBase, Vertica, SAP HANA
Document: Apache CouchDB, ArangoDB, Clusterpoint, Couchbase,
DocumentDB, HyperDex, IBM Domino, MarkLogic, MongoDB, OrientDB,
Qizx, RethinkDB
Key-value: Aerospike, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE,
FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database,
OrientDB, Redis, Riak, Berkeley DB
Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic,
Neo4J, OrientDB, Virtuoso, Stardog
Multi-model: Alchemy Database, ArangoDB, CortexDB, Couchbase,
FoundationDB, MarkLogic, OrientDB
By design, NoSQL databases and management systems are relation-less (or schema-less).
They are not based on a single model (e.g. relational model of RDBMSs) and each
database, depending on their target-functionality, adopt a different one.
There are almost a handful of different operational models and functioning systems for
NoSQL databases.:
Key / Value: e.g. Redis, MemcacheDB, etc.
Column: e.g. Cassandra, HBase, etc.
Document: e.g. MongoDB, Couchbase, etc
Graph: e.g. OrientDB, Neo4J, etc.
In order to better understand the roles and underlying technology of each database
management system, let's quickly go over these four operational models.
Key / Value Based
We will begin our NoSQL modeling journey with key / value based database management
simply because they can be considered the most basic and backbone implementation of
NoSQL.
These type of databases work by matching keys with values, similar to a dictionary. There is
no structure nor relation. After connecting to the database server (e.g. Redis), an
application can state a key (e.g. the_answer_to_life) and provide a matching value (e.g. 42)
which can later be retrieved the same way by supplying the key.
Key / value DBMSs are usually used for quickly storing basic information, and sometimes
not-so-basic ones after performing, for example, a CPU and memory intensive
computation. They are extremely performant, efficient and usually easily scalable.
When it comes to computers, a dictionary usually refers to a special sort of data object.
They constitutes of arrays of collections with individual keys matching values.
Column Based
Column based NoSQL database management systems work by advancing the simple
nature of key / value based ones.
Despite their complicated-to-understand image on the internet, these databases work very
simply by creating collections of one or more key / value pairs that match a record.
Unlike the traditional defines schemas of relational databases, column-based NoSQL
solutions do not require a pre-structured table to work with the data. Each record comes
with one or more columns containing the information and each column of each record can
be different.
Basically, column-based NoSQL databases are two dimensional arrays whereby each key
(i.e. row / record) has one or more key / value pairs attached to it and these management
systems allow very large and un-structured data to be kept and used (e.g. a record with tons
of information).
These databases are commonly used when simple key / value pairs are not enough, and
storing very large numbers of records with very large numbers of information is a must.
DBMS implementing column-based, schema-less models can scale extremely well.
Document Based
Document based NoSQL database management systems can be considered the latest craze
that managed to take a lot of people by storm. These DBMS work in a similar fashion to
column-based ones; however, they allow much deeper nesting and complex structures to
be achieved (e.g. a document, within a document, within a document).
Documents overcome the constraints of one or two level of key / value nesting of columnar
databases. Basically, any complex and arbitrary structure can form a document, which can
be stored using these management systems.
Despite their powerful nature, and the ability to query records by individual keys,
document based management systems have their own issues and downfalls compared to
others. For example, retrieving a value of a record means getting the whole lot of it and
same goes for updates, all of which affect the performance.
Graph Based
Finally, the very interesting flavour of NoSQL database management systems is the graph
based ones.
The graph based DBMS models represent the data in a completely different way than the
previous three models. They use tree-like structures (i.e. graphs) with nodes and edges
connecting each other through relations.
Similarly to mathematics, certain operations are much simpler to perform using these type
of models thanks to their nature of linking and grouping related pieces of information (e.g.
connected people).
These databases are commonly used by applications whereby clear boundaries for
connections are necessary to establish. For example, when you register to a social network
of any sort, your friends' connection to you and their friends' friends' relation to you are
much easier to work with using graph-based database management systems.
There are following properties of NoSQL databases.
Design Simplicity
Horizontal Scaling
High Availability
Data structures used in Cassandra are more specified than data structures used in relational
databases. Cassandra data structures are faster than relational database structures.
NoSQL databases are increasingly used in Big Data and real-time web applications.
NoSQL databases are sometimes called Not Only SQL i.e. they may support SQL-like
query language.
Nosql Vs RDBMS
Here are the differences between relation databases and NoSQL databases in a tabular
format.
Relational Database NoSQL Database
Handles data coming in low velocity Handles data coming in high velocity
Data arrive from one or few locations Data arrive from many locations
Manages structured data
Manages structured unstructured and semi-
structured data.
Supports complex transactions (with
joins)
Supports simple transactions
single point of failure with failover No single point of failure
Handles data in the moderate
volume.
Handles data in very high volume
Centralized deployments Decentralized deployments
Transactions written in one location Transaction written in many locations
Gives read scalability Gives both read and write scalability
Deployed in vertical fashion Deployed in Horizontal fashion
1.2. Cassandra Basics and Terminology
Apache Cassandra is highly scalable, distributed and high-performance NoSQL database.
Cassandra is designed to handle a huge amount of data.
In the image above, circles are Cassandra nodes and lines between the circles shows
distributed architecture, while the client is sending data to the node. Cassandra handles the
huge amount of data with its distributed architecture. Data is placed on different machines
with more than one replication factor that provides high availability and no single point of
failure.
Cassandra History
Cassandra was first developed at Facebook for inbox search.
Facebook open sourced it in July 2008.
Apache incubator accepted Cassandra in March 2009.
Cassandra is a top level project of Apache since February 2010.
The latest version of Apache Cassandra is 3.2.1.
The 3.0 release was made available in November 2015. It includes features are
The underlying storage engine has been rewritten to more closely match CQL
constructs
Support for materialized views (sometimes also called global indexes)
Java 8 is now the supported version
The Thrift-based Command Line Interface (CLI) is removed
Apache Cassandra Features
There are main features of Cassandra are
Massively Scalable Architecture: Cassandra has a masterless design where all nodes
are at the same level which provides operational simplicity and easy scale out.
Masterless Architecture: Data can be written and read on any node.
Linear Scale Performance: As more nodes are added, the performance of
Cassandra increases.
No Single point of failure: Cassandra replicates data on different nodes that ensures
no single point of failure.
Fault Detection and Recovery: Failed nodes can easily be restored and recovered.
Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads.
Data Protection: Data is protected with commit log design and build in security like
backup and restore mechanisms.
Tunable Data Consistency: Support for strong data consistency across distributed
architecture.
Multi Data Center Replication: Cassandra provides feature to replicate data across
multiple data center.
Data Compression: Cassandra can compress up to 80% data without any overhead.
Cassandra Query language: Cassandra provides query language that is similar like
SQL language. It makes very easy for relational database developers moving from
relational database to Cassandra.
Application of Cassandra
Cassandra is a non-relational database that can be used for different types of applications.
Here are some use cases where Cassandra should be preferred.
Messaging - Cassandra is a great database for the companies that provides mobile
phones and messaging services. These companies have a huge amount of data, so
Cassandra is best for them.
Internet of things Application - Cassandra is a great database for the applications
where data is coming at very high speed from different devices or sensors.
Product Catalogs and retail apps - Cassandra is used by many retailers for durable
shopping cart protection and fast product catalog input and output.
Social Media Analytics and recommendation engine - Cassandra is a great database
for many online companies and social media providers for analysis and
recommendation to their customers.
Distributed Database
Cassandra is distributed, which means that it is capable of running on multiple machines
while appearing to users as a unified whole. In fact, there is little point in running a single
Cassandra node. Although you can do it, and that’s acceptable for getting up to speed on
how it works, you quickly realize that you’ll need multiple machines to really realize any
benefit from running Cassandra. Much of its design and code base is specifically
engineered toward not only making it work across many different machines, but also for
optimizing performance across multiple data center racks, and even for a single Cassandra
cluster running across geographically dispersed data centers. You can confidently write data
to anywhere in the cluster and Cassandra will get it.
Once you start to scale many other data stores (MySQL, Bigtable), some nodes need to be
set up as masters in order to organize other nodes, which are set up as slaves. Cassandra,
however, is decentralized, meaning that every node is identical; no Cassandra node
performs certain organizing operations distinct from any other node. Instead, Cassandra
features a peer-to-peer protocol and uses gossip to maintain and keep in sync a list of nodes
that are alive or dead.
The fact that Cassandra is decentralized means that there is no single point of failure. All of
the nodes in a Cassandra cluster function exactly the same. This is sometimes referred to as
“server symmetry.” Because they are all doing the same thing, by definition there can’t be a
special host that is coordinating activities, as with the master/ slave setup that you see in
MySQL, Bigtable, and so many others.
Decentralization, therefore, has two key advantages: it’s simpler to use than master/slave,
and it helps you avoid outages. It can be easier to operate and maintain a decentralized
store than a master/slave store because all nodes are the same. That means that you don’t
need any special knowledge to scale; setting up 50 nodes isn’t much different from setting
up one. There’s next to no configuration required to support it.
Moreover, in a master/slave setup, the master can become a single point of failure (SPOF).
To avoid this, you often need to add some complexity to the environment in the form of
multiple masters. Because all of the replicas in Cassandra are identical, failures of a node
won’t disrupt service.
Elastic Scalability
Scalability is an architectural feature of a system that can continue serving a greater number
of requests with little degradation in performance. Vertical scaling—simply adding more
hardware capacity and memory to your existing machine—is the easiest way to achieve this.
Horizontal scaling means adding more machines that have all or some of the data on them
so that no one machine has to bear the entire burden of serving requests. But then the
software itself must have an internal mechanism for keeping its data in sync with the other
nodes in the cluster.
Elastic scalability refers to a special property of horizontal scalability. It means that your
cluster can seamlessly scale up and scale back down. To do this, the cluster must be able to
accept new nodes that can begin participating by getting a copy of some or all of the data
and start serving new user requests without major disruption or reconfiguration of the
entire cluster. You don’t have to restart your process. You don’t have to change your
application queries. You don’t have to manually rebalance the data yourself. Just add
another machine—Cassandra will find it and start sending it work.
Consistency
Consistency essentially means that a read always returns the most recently written value.
Consider two customers are attempting to put the same item into their shopping carts on
an ecommerce site. If I place the last item in stock into my cart an instant after you do, you
should get the item added to your cart, and I should be informed that the item is no longer
available for purchase. This is guaranteed to hap pen when the state of a write is consistent
among all nodes that have that data.
But as we’ll see later, scaling data stores means making certain trade-offs between data
consistency, node availability, and partition tolerance. Cassandra is frequently called
“eventually consistent,” which is a bit misleading. Out of the box, Cassandra trades some
consistency in order to achieve total availability. But Cassandra is more accurately termed
“tuneably consistent,” which means it allows you to easily decide the level of consistency
you require, in balance with the level of availability.
Types and examples of NoSQL databases
There have been various approaches to classify NoSQL databases, each with different
categories and subcategories, some of which overlap. What follows is a basic classification
by data model, with examples:
Column: Accumulo, Cassandra, Druid, HBase, Vertica, SAP HANA
Document: Apache CouchDB, ArangoDB, Clusterpoint, Couchbase,
DocumentDB, HyperDex, IBM Domino, MarkLogic, MongoDB, OrientDB,
Qizx, RethinkDB
Key-value: Aerospike, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE,
FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database,
OrientDB, Redis, Riak, Berkeley DB
Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic,
Neo4J, OrientDB, Virtuoso, Stardog
Multi-model: Alchemy Database, ArangoDB, CortexDB, Couchbase,
FoundationDB, MarkLogic, OrientDB
By design, NoSQL databases and management systems are relation-less (or schema-less).
They are not based on a single model (e.g. relational model of RDBMSs) and each
database, depending on their target-functionality, adopt a different one.
There are almost a handful of different operational models and functioning systems for
NoSQL databases.:
Key / Value: e.g. Redis, MemcacheDB, etc.
Column: e.g. Cassandra, HBase, etc.
Document: e.g. MongoDB, Couchbase, etc
Graph: e.g. OrientDB, Neo4J, etc.
In order to better understand the roles and underlying technology of each database
management system, let's quickly go over these four operational models.
Key / Value Based
We will begin our NoSQL modeling journey with key / value based database management
simply because they can be considered the most basic and backbone implementation of
NoSQL.
These type of databases work by matching keys with values, similar to a dictionary. There is
no structure nor relation. After connecting to the database server (e.g. Redis), an
application can state a key (e.g. the_answer_to_life) and provide a matching value (e.g. 42)
which can later be retrieved the same way by supplying the key.
Ad

More Related Content

What's hot (20)

No sq lv2
No sq lv2No sq lv2
No sq lv2
Nusrat Sharmin
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Dimitar Danailov
 
No sql
No sqlNo sql
No sql
Neeraj Kaushik
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
IJDMS
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Meshal Albeedhani
 
No sql databases explained
No sql databases explainedNo sql databases explained
No sql databases explained
Salil Mehendale
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
Shubham Tomar
 
cassandra
cassandracassandra
cassandra
Akash R
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
Mohammed Fazuluddin
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
ijfcstjournal
 
No sql database
No sql databaseNo sql database
No sql database
vishal gupta
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
Ram kumar
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Filip Ilievski
 
Nosql
NosqlNosql
Nosql
Muluken Sholaye Tesfaye
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
Udi Bauman
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
IJDMS
 
No sql databases explained
No sql databases explainedNo sql databases explained
No sql databases explained
Salil Mehendale
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
Shubham Tomar
 
cassandra
cassandracassandra
cassandra
Akash R
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
ijfcstjournal
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
Ram kumar
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
Udi Bauman
 

Similar to Vskills Apache Cassandra sample material (20)

Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docx
vvpadhu
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
revathigollu23
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdfNoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
SharmilaChidaravalli
 
NoSQL
NoSQLNoSQL
NoSQL
Khawar Nehal [email protected]
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
Prasoon Sharma
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
AtulKabbur
 
Introduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptxIntroduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptx
SherinRappai
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
Brief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas moslehBrief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas mosleh
Fas (Feisal) Mosleh
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
pinstechwork
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
GhanashyamBK1
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
Abiral Gautam
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
Ahmad Awsaf-uz-zaman
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
pinstechwork
 
No sql databases
No sql databasesNo sql databases
No sql databases
Walaa Hamdy Assy
 
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptxDATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
Laxmi Pandya
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
Kirti J
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docx
vvpadhu
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdfNoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
SharmilaChidaravalli
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
AtulKabbur
 
Introduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptxIntroduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptx
SherinRappai
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
Brief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas moslehBrief introduction to NoSQL by fas mosleh
Brief introduction to NoSQL by fas mosleh
Fas (Feisal) Mosleh
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
Abiral Gautam
 
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptxDATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
Laxmi Pandya
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
Kirti J
 
Ad

More from Vskills (20)

Vskills certified administrative support professional sample material
Vskills certified administrative support professional sample materialVskills certified administrative support professional sample material
Vskills certified administrative support professional sample material
Vskills
 
vskills customer service professional sample material
vskills customer service professional sample materialvskills customer service professional sample material
vskills customer service professional sample material
Vskills
 
Vskills certified operations manager sample material
Vskills certified operations manager sample materialVskills certified operations manager sample material
Vskills certified operations manager sample material
Vskills
 
Vskills certified six sigma yellow belt sample material
Vskills certified six sigma yellow belt sample materialVskills certified six sigma yellow belt sample material
Vskills certified six sigma yellow belt sample material
Vskills
 
Vskills production and operations management sample material
Vskills production and operations management sample materialVskills production and operations management sample material
Vskills production and operations management sample material
Vskills
 
vskills leadership skills professional sample material
vskills leadership skills professional sample materialvskills leadership skills professional sample material
vskills leadership skills professional sample material
Vskills
 
vskills facility management expert sample material
vskills facility management expert sample materialvskills facility management expert sample material
vskills facility management expert sample material
Vskills
 
Vskills international trade and forex professional sample material
Vskills international trade and forex professional sample materialVskills international trade and forex professional sample material
Vskills international trade and forex professional sample material
Vskills
 
Vskills production planning and control professional sample material
Vskills production planning and control professional sample materialVskills production planning and control professional sample material
Vskills production planning and control professional sample material
Vskills
 
Vskills purchasing and material management professional sample material
Vskills purchasing and material management professional sample materialVskills purchasing and material management professional sample material
Vskills purchasing and material management professional sample material
Vskills
 
Vskills manufacturing technology management professional sample material
Vskills manufacturing technology management professional sample materialVskills manufacturing technology management professional sample material
Vskills manufacturing technology management professional sample material
Vskills
 
certificate in agile project management sample material
certificate in agile project management sample materialcertificate in agile project management sample material
certificate in agile project management sample material
Vskills
 
Vskills angular js sample material
Vskills angular js sample materialVskills angular js sample material
Vskills angular js sample material
Vskills
 
Vskills c++ developer sample material
Vskills c++ developer sample materialVskills c++ developer sample material
Vskills c++ developer sample material
Vskills
 
Vskills c developer sample material
Vskills c developer sample materialVskills c developer sample material
Vskills c developer sample material
Vskills
 
Vskills financial modelling professional sample material
Vskills financial modelling professional sample materialVskills financial modelling professional sample material
Vskills financial modelling professional sample material
Vskills
 
Vskills basel iii professional sample material
Vskills basel iii professional sample materialVskills basel iii professional sample material
Vskills basel iii professional sample material
Vskills
 
Vskills telecom management professional sample material
Vskills telecom management professional sample materialVskills telecom management professional sample material
Vskills telecom management professional sample material
Vskills
 
Vskills retail management professional sample material
Vskills retail management professional sample materialVskills retail management professional sample material
Vskills retail management professional sample material
Vskills
 
Vskills contract law analyst sample material
Vskills contract law analyst sample materialVskills contract law analyst sample material
Vskills contract law analyst sample material
Vskills
 
Vskills certified administrative support professional sample material
Vskills certified administrative support professional sample materialVskills certified administrative support professional sample material
Vskills certified administrative support professional sample material
Vskills
 
vskills customer service professional sample material
vskills customer service professional sample materialvskills customer service professional sample material
vskills customer service professional sample material
Vskills
 
Vskills certified operations manager sample material
Vskills certified operations manager sample materialVskills certified operations manager sample material
Vskills certified operations manager sample material
Vskills
 
Vskills certified six sigma yellow belt sample material
Vskills certified six sigma yellow belt sample materialVskills certified six sigma yellow belt sample material
Vskills certified six sigma yellow belt sample material
Vskills
 
Vskills production and operations management sample material
Vskills production and operations management sample materialVskills production and operations management sample material
Vskills production and operations management sample material
Vskills
 
vskills leadership skills professional sample material
vskills leadership skills professional sample materialvskills leadership skills professional sample material
vskills leadership skills professional sample material
Vskills
 
vskills facility management expert sample material
vskills facility management expert sample materialvskills facility management expert sample material
vskills facility management expert sample material
Vskills
 
Vskills international trade and forex professional sample material
Vskills international trade and forex professional sample materialVskills international trade and forex professional sample material
Vskills international trade and forex professional sample material
Vskills
 
Vskills production planning and control professional sample material
Vskills production planning and control professional sample materialVskills production planning and control professional sample material
Vskills production planning and control professional sample material
Vskills
 
Vskills purchasing and material management professional sample material
Vskills purchasing and material management professional sample materialVskills purchasing and material management professional sample material
Vskills purchasing and material management professional sample material
Vskills
 
Vskills manufacturing technology management professional sample material
Vskills manufacturing technology management professional sample materialVskills manufacturing technology management professional sample material
Vskills manufacturing technology management professional sample material
Vskills
 
certificate in agile project management sample material
certificate in agile project management sample materialcertificate in agile project management sample material
certificate in agile project management sample material
Vskills
 
Vskills angular js sample material
Vskills angular js sample materialVskills angular js sample material
Vskills angular js sample material
Vskills
 
Vskills c++ developer sample material
Vskills c++ developer sample materialVskills c++ developer sample material
Vskills c++ developer sample material
Vskills
 
Vskills c developer sample material
Vskills c developer sample materialVskills c developer sample material
Vskills c developer sample material
Vskills
 
Vskills financial modelling professional sample material
Vskills financial modelling professional sample materialVskills financial modelling professional sample material
Vskills financial modelling professional sample material
Vskills
 
Vskills basel iii professional sample material
Vskills basel iii professional sample materialVskills basel iii professional sample material
Vskills basel iii professional sample material
Vskills
 
Vskills telecom management professional sample material
Vskills telecom management professional sample materialVskills telecom management professional sample material
Vskills telecom management professional sample material
Vskills
 
Vskills retail management professional sample material
Vskills retail management professional sample materialVskills retail management professional sample material
Vskills retail management professional sample material
Vskills
 
Vskills contract law analyst sample material
Vskills contract law analyst sample materialVskills contract law analyst sample material
Vskills contract law analyst sample material
Vskills
 
Ad

Recently uploaded (20)

Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 

Vskills Apache Cassandra sample material

  • 2. 1. INTRODUCTION TO NOSQL NoSQL databases try to offer certain functionality that more traditional relational database management systems do not. Whether it is for holding simple key-value pairs for shorter lengths of time for caching purposes, or keeping unstructured collections (e.g. collections) of data that could not be easily dealt with using relational databases and the structured query language (SQL) – they are here to help. 1.1. NoSQL Basics A NoSQL (originally referring to "non SQL", "non relational" or "not only SQL") database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the "NoSQL" moniker until a surge of popularity in the early twenty-first century, triggered by the needs of Web 2.0 companies such as Facebook, Google, and Amazon.com. NoSQL databases are increasingly used in big data and real- time web applications. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages. Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases), and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables. Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance the lack of ability to perform ad-hoc joins across tables), lack of standardized interfaces, and huge previous investments in existing relational databases.] Most NoSQL stores lack true ACID transactions, although a few databases, such as MarkLogic, Aerospike, FairCom c-treeACE, Google Spanner (though technically a NewSQL database), Symas LMDB, and OrientDB have made them central to their designs. Instead, most NoSQL databases offer a concept of "eventual consistency" in which database changes are propagated to all nodes "eventually" (typically within milliseconds) so queries for data might not return updated data immediately or might result in reading data that is not accurate, a problem known as stale reads. Additionally, some NoSQL systems may exhibit lost writes and other forms of data loss. Fortunately, some NoSQL systems provide concepts such as write-ahead logging to avoid data loss. For distributed transaction processing across multiple databases, data consistency is an even bigger challenge that is difficult for both NoSQL and relational databases. Even current relational databases "do not allow referential integrity constraints to span databases." There are few systems that maintain both ACID transactions and X/Open XA standards for distributed transaction processing.
  • 3. Types and examples of NoSQL databases There have been various approaches to classify NoSQL databases, each with different categories and subcategories, some of which overlap. What follows is a basic classification by data model, with examples: Column: Accumulo, Cassandra, Druid, HBase, Vertica, SAP HANA Document: Apache CouchDB, ArangoDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB Key-value: Aerospike, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog Multi-model: Alchemy Database, ArangoDB, CortexDB, Couchbase, FoundationDB, MarkLogic, OrientDB By design, NoSQL databases and management systems are relation-less (or schema-less). They are not based on a single model (e.g. relational model of RDBMSs) and each database, depending on their target-functionality, adopt a different one. There are almost a handful of different operational models and functioning systems for NoSQL databases.: Key / Value: e.g. Redis, MemcacheDB, etc. Column: e.g. Cassandra, HBase, etc. Document: e.g. MongoDB, Couchbase, etc Graph: e.g. OrientDB, Neo4J, etc. In order to better understand the roles and underlying technology of each database management system, let's quickly go over these four operational models. Key / Value Based We will begin our NoSQL modeling journey with key / value based database management simply because they can be considered the most basic and backbone implementation of NoSQL. These type of databases work by matching keys with values, similar to a dictionary. There is no structure nor relation. After connecting to the database server (e.g. Redis), an application can state a key (e.g. the_answer_to_life) and provide a matching value (e.g. 42) which can later be retrieved the same way by supplying the key.
  • 4. Key / value DBMSs are usually used for quickly storing basic information, and sometimes not-so-basic ones after performing, for example, a CPU and memory intensive computation. They are extremely performant, efficient and usually easily scalable. When it comes to computers, a dictionary usually refers to a special sort of data object. They constitutes of arrays of collections with individual keys matching values. Column Based Column based NoSQL database management systems work by advancing the simple nature of key / value based ones. Despite their complicated-to-understand image on the internet, these databases work very simply by creating collections of one or more key / value pairs that match a record. Unlike the traditional defines schemas of relational databases, column-based NoSQL solutions do not require a pre-structured table to work with the data. Each record comes with one or more columns containing the information and each column of each record can be different. Basically, column-based NoSQL databases are two dimensional arrays whereby each key (i.e. row / record) has one or more key / value pairs attached to it and these management systems allow very large and un-structured data to be kept and used (e.g. a record with tons of information). These databases are commonly used when simple key / value pairs are not enough, and storing very large numbers of records with very large numbers of information is a must. DBMS implementing column-based, schema-less models can scale extremely well. Document Based Document based NoSQL database management systems can be considered the latest craze that managed to take a lot of people by storm. These DBMS work in a similar fashion to column-based ones; however, they allow much deeper nesting and complex structures to be achieved (e.g. a document, within a document, within a document). Documents overcome the constraints of one or two level of key / value nesting of columnar databases. Basically, any complex and arbitrary structure can form a document, which can be stored using these management systems. Despite their powerful nature, and the ability to query records by individual keys, document based management systems have their own issues and downfalls compared to others. For example, retrieving a value of a record means getting the whole lot of it and same goes for updates, all of which affect the performance. Graph Based Finally, the very interesting flavour of NoSQL database management systems is the graph based ones.
  • 5. The graph based DBMS models represent the data in a completely different way than the previous three models. They use tree-like structures (i.e. graphs) with nodes and edges connecting each other through relations. Similarly to mathematics, certain operations are much simpler to perform using these type of models thanks to their nature of linking and grouping related pieces of information (e.g. connected people). These databases are commonly used by applications whereby clear boundaries for connections are necessary to establish. For example, when you register to a social network of any sort, your friends' connection to you and their friends' friends' relation to you are much easier to work with using graph-based database management systems. There are following properties of NoSQL databases. Design Simplicity Horizontal Scaling High Availability Data structures used in Cassandra are more specified than data structures used in relational databases. Cassandra data structures are faster than relational database structures. NoSQL databases are increasingly used in Big Data and real-time web applications. NoSQL databases are sometimes called Not Only SQL i.e. they may support SQL-like query language. Nosql Vs RDBMS Here are the differences between relation databases and NoSQL databases in a tabular format. Relational Database NoSQL Database Handles data coming in low velocity Handles data coming in high velocity Data arrive from one or few locations Data arrive from many locations Manages structured data Manages structured unstructured and semi- structured data. Supports complex transactions (with joins) Supports simple transactions single point of failure with failover No single point of failure Handles data in the moderate volume. Handles data in very high volume Centralized deployments Decentralized deployments Transactions written in one location Transaction written in many locations Gives read scalability Gives both read and write scalability Deployed in vertical fashion Deployed in Horizontal fashion 1.2. Cassandra Basics and Terminology Apache Cassandra is highly scalable, distributed and high-performance NoSQL database. Cassandra is designed to handle a huge amount of data.
  • 6. In the image above, circles are Cassandra nodes and lines between the circles shows distributed architecture, while the client is sending data to the node. Cassandra handles the huge amount of data with its distributed architecture. Data is placed on different machines with more than one replication factor that provides high availability and no single point of failure. Cassandra History Cassandra was first developed at Facebook for inbox search. Facebook open sourced it in July 2008. Apache incubator accepted Cassandra in March 2009. Cassandra is a top level project of Apache since February 2010. The latest version of Apache Cassandra is 3.2.1. The 3.0 release was made available in November 2015. It includes features are The underlying storage engine has been rewritten to more closely match CQL constructs Support for materialized views (sometimes also called global indexes) Java 8 is now the supported version The Thrift-based Command Line Interface (CLI) is removed Apache Cassandra Features There are main features of Cassandra are Massively Scalable Architecture: Cassandra has a masterless design where all nodes are at the same level which provides operational simplicity and easy scale out. Masterless Architecture: Data can be written and read on any node. Linear Scale Performance: As more nodes are added, the performance of Cassandra increases. No Single point of failure: Cassandra replicates data on different nodes that ensures no single point of failure. Fault Detection and Recovery: Failed nodes can easily be restored and recovered. Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads.
  • 7. Data Protection: Data is protected with commit log design and build in security like backup and restore mechanisms. Tunable Data Consistency: Support for strong data consistency across distributed architecture. Multi Data Center Replication: Cassandra provides feature to replicate data across multiple data center. Data Compression: Cassandra can compress up to 80% data without any overhead. Cassandra Query language: Cassandra provides query language that is similar like SQL language. It makes very easy for relational database developers moving from relational database to Cassandra. Application of Cassandra Cassandra is a non-relational database that can be used for different types of applications. Here are some use cases where Cassandra should be preferred. Messaging - Cassandra is a great database for the companies that provides mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them. Internet of things Application - Cassandra is a great database for the applications where data is coming at very high speed from different devices or sensors. Product Catalogs and retail apps - Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output. Social Media Analytics and recommendation engine - Cassandra is a great database for many online companies and social media providers for analysis and recommendation to their customers. Distributed Database Cassandra is distributed, which means that it is capable of running on multiple machines while appearing to users as a unified whole. In fact, there is little point in running a single Cassandra node. Although you can do it, and that’s acceptable for getting up to speed on how it works, you quickly realize that you’ll need multiple machines to really realize any benefit from running Cassandra. Much of its design and code base is specifically engineered toward not only making it work across many different machines, but also for optimizing performance across multiple data center racks, and even for a single Cassandra cluster running across geographically dispersed data centers. You can confidently write data to anywhere in the cluster and Cassandra will get it. Once you start to scale many other data stores (MySQL, Bigtable), some nodes need to be set up as masters in order to organize other nodes, which are set up as slaves. Cassandra, however, is decentralized, meaning that every node is identical; no Cassandra node performs certain organizing operations distinct from any other node. Instead, Cassandra features a peer-to-peer protocol and uses gossip to maintain and keep in sync a list of nodes that are alive or dead. The fact that Cassandra is decentralized means that there is no single point of failure. All of the nodes in a Cassandra cluster function exactly the same. This is sometimes referred to as
  • 8. “server symmetry.” Because they are all doing the same thing, by definition there can’t be a special host that is coordinating activities, as with the master/ slave setup that you see in MySQL, Bigtable, and so many others. Decentralization, therefore, has two key advantages: it’s simpler to use than master/slave, and it helps you avoid outages. It can be easier to operate and maintain a decentralized store than a master/slave store because all nodes are the same. That means that you don’t need any special knowledge to scale; setting up 50 nodes isn’t much different from setting up one. There’s next to no configuration required to support it. Moreover, in a master/slave setup, the master can become a single point of failure (SPOF). To avoid this, you often need to add some complexity to the environment in the form of multiple masters. Because all of the replicas in Cassandra are identical, failures of a node won’t disrupt service. Elastic Scalability Scalability is an architectural feature of a system that can continue serving a greater number of requests with little degradation in performance. Vertical scaling—simply adding more hardware capacity and memory to your existing machine—is the easiest way to achieve this. Horizontal scaling means adding more machines that have all or some of the data on them so that no one machine has to bear the entire burden of serving requests. But then the software itself must have an internal mechanism for keeping its data in sync with the other nodes in the cluster. Elastic scalability refers to a special property of horizontal scalability. It means that your cluster can seamlessly scale up and scale back down. To do this, the cluster must be able to accept new nodes that can begin participating by getting a copy of some or all of the data and start serving new user requests without major disruption or reconfiguration of the entire cluster. You don’t have to restart your process. You don’t have to change your application queries. You don’t have to manually rebalance the data yourself. Just add another machine—Cassandra will find it and start sending it work. Consistency Consistency essentially means that a read always returns the most recently written value. Consider two customers are attempting to put the same item into their shopping carts on an ecommerce site. If I place the last item in stock into my cart an instant after you do, you should get the item added to your cart, and I should be informed that the item is no longer available for purchase. This is guaranteed to hap pen when the state of a write is consistent among all nodes that have that data. But as we’ll see later, scaling data stores means making certain trade-offs between data consistency, node availability, and partition tolerance. Cassandra is frequently called “eventually consistent,” which is a bit misleading. Out of the box, Cassandra trades some consistency in order to achieve total availability. But Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability.
  • 9. Types and examples of NoSQL databases There have been various approaches to classify NoSQL databases, each with different categories and subcategories, some of which overlap. What follows is a basic classification by data model, with examples: Column: Accumulo, Cassandra, Druid, HBase, Vertica, SAP HANA Document: Apache CouchDB, ArangoDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB Key-value: Aerospike, ArangoDB, Couchbase, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB Graph: AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog Multi-model: Alchemy Database, ArangoDB, CortexDB, Couchbase, FoundationDB, MarkLogic, OrientDB By design, NoSQL databases and management systems are relation-less (or schema-less). They are not based on a single model (e.g. relational model of RDBMSs) and each database, depending on their target-functionality, adopt a different one. There are almost a handful of different operational models and functioning systems for NoSQL databases.: Key / Value: e.g. Redis, MemcacheDB, etc. Column: e.g. Cassandra, HBase, etc. Document: e.g. MongoDB, Couchbase, etc Graph: e.g. OrientDB, Neo4J, etc. In order to better understand the roles and underlying technology of each database management system, let's quickly go over these four operational models. Key / Value Based We will begin our NoSQL modeling journey with key / value based database management simply because they can be considered the most basic and backbone implementation of NoSQL. These type of databases work by matching keys with values, similar to a dictionary. There is no structure nor relation. After connecting to the database server (e.g. Redis), an application can state a key (e.g. the_answer_to_life) and provide a matching value (e.g. 42) which can later be retrieved the same way by supplying the key.