Basic Introduction to Cassandra with Architecture and strategies.
with big data challenge. What is NoSQL Database.
The Big Data Challenge
The Cassandra Solution
The CAP Theorem
The Architecture of Cassandra
The Data Partition and Replication
This document provides an agenda and introduction for a presentation on Apache Cassandra and DataStax Enterprise. The presentation covers an introduction to Cassandra and NoSQL, the CAP theorem, Apache Cassandra features and architecture including replication, consistency levels and failure handling. It also discusses the Cassandra Query Language, data modeling for time series data, and new features in DataStax Enterprise like Spark integration and secondary indexes on collections. The presentation concludes with recommendations for getting started with Cassandra in production environments.
Archaic database technologies just don't scale under the always on, distributed demands of modern IOT, mobile and web applications. We'll start this Intro to Cassandra by discussing how its approach is different and why so many awesome companies have migrated from the cold clutches of the relational world into the warm embrace of peer to peer architecture. After this high-level opening discussion, we'll briefly unpack the following:
• Cassandra's internal architecture and distribution model
• Cassandra's Data Model
• Reads and Writes
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
Cassandra is a decentralized structured storage system that was initially developed at Facebook to power their inbox search. It is based on Amazon's Dynamo and Google's BigTable data models. Cassandra provides tunable consistency, high availability with no single points of failure, horizontal scalability and elasticity. It allows data to be distributed across multiple data centers and easily handles the addition or removal of nodes.
This document introduces Apache Cassandra, a distributed column-oriented NoSQL database. It discusses Cassandra's architecture, data model, query language (CQL), and how to install and run Cassandra. Key points covered include Cassandra's linear scalability, high availability and fault tolerance. The document also demonstrates how to use the nodetool utility and provides guidance on backing up and restoring Cassandra data.
This document provides an overview of the Cassandra NoSQL database. It begins with definitions of Cassandra and discusses its history and origins from projects like Bigtable and Dynamo. The document outlines Cassandra's architecture including its peer-to-peer distributed design, data partitioning, replication, and use of gossip protocols for cluster management. It provides examples of key features like tunable consistency levels and flexible schema design. Finally, it discusses companies that use Cassandra like Facebook and provides performance comparisons with MySQL.
Agenda
- What is NOSQL?
- Motivations for NOSQL?
- Brewer’s CAP Theorem
- Taxonomy of NOSQL databases
- Apache Cassandra
- Features
- Data Model
- Consistency
- Operations
- Cluster Membership
- What Does NOSQL means for RDBMS?
The document provides an overview of Apache Cassandra's architecture and design. It was created to address the needs of building reliable, high-performing, and always-available distributed databases. Cassandra is based on Dynamo and BigTable and uses a distributed hashing technique to partition and replicate data across nodes. It supports configurable replication across multiple data centers for high availability. Writes are sent to the local node and replicated to other nodes based on consistency level, while reads can be served from any replica.
Cassandra is a distributed, column-oriented database that scales horizontally and is optimized for writes. It uses consistent hashing to distribute data across nodes and achieve high availability even when nodes join or leave the cluster. Cassandra offers flexible consistency options and tunable replication to balance availability and durability for read and write operations across the distributed database.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
This document introduces HBase, an open-source, non-relational, distributed database modeled after Google's BigTable. It describes what HBase is, how it can be used, and when it is applicable. Key points include that HBase stores data in columns and rows accessed by row keys, integrates with Hadoop for MapReduce jobs, and is well-suited for large datasets, fast random access, and write-heavy applications. Common use cases involve log analytics, real-time analytics, and messages-centered systems.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It aims for high availability with no single points of failure. Data is distributed across nodes and replicated for redundancy. Cassandra uses a decentralized design with peer-to-peer communication and an eventually consistent model. It requires denormalized data models and queries to be defined prior to data structure.
NewSQL databases seek to provide the same scalable performance as NoSQL databases for online transaction processing workloads, while still maintaining the ACID guarantees of a traditional SQL database. NewSQL databases use new architectures like multi-version concurrency control and partition-level locking to allow for horizontal scaling and high availability without sacrificing consistency. They also provide highly optimized SQL engines to query data in a distributed environment.
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
Why do queries run out of memory? How can I make my queries even faster? How should I size ClickHouse nodes for best cost-efficiency? The key to these questions and many others is knowing what happens inside ClickHouse when a query runs. This webinar is a gentle introduction to ClickHouse internals, focusing on topics that will help your applications run faster and more efficiently. We’ll discuss the basic flow of query execution, dig into how ClickHouse handles aggregation and joins, and show you how ClickHouse distributes processing within a single CPU as well as across many nodes in the network. After attending this webinar you’ll understand how to open up the black box and see what the parts are doing.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
MaxScale uses an asynchronous and multi-threaded architecture to route client queries to backend database servers. Each thread creates its own epoll instance to monitor file descriptors for I/O events, avoiding locking between threads. Listening sockets are added to a global epoll file descriptor that notifies threads when clients connect, allowing connections to be distributed evenly across threads. This architecture improves performance over the previous single epoll instance approach.
Presto on Apache Spark: A Tale of Two Computation EnginesDatabricks
The architectural tradeoffs between the map/reduce paradigm and parallel databases has been a long and open discussion since the dawn of MapReduce over more than a decade ago. At Facebook, we have spent the past several years in independently building and scaling both Presto and Spark to Facebook scale batch workloads, and it is now increasingly evident that there is significant value in coupling Presto’s state-of-art low-latency evaluation with Spark’s robust and fault tolerant execution engine.
Apache Spark is an open-source cluster computing framework for large-scale data processing. It supports batch processing, real-time processing, streaming analytics, machine learning, interactive queries, and graph processing. Spark core provides distributed task dispatching and scheduling. It works by having a driver program that connects to a cluster manager to run tasks on executors in worker nodes. Spark also introduces Resilient Distributed Datasets (RDDs) that allow immutable, parallel data processing. Common RDD transformations include map, flatMap, groupByKey, and reduceByKey while common actions include reduce.
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
Many users set the replication strategy on their keyspaces to NetworkTopologyStrategy and move on with modeling their data or developing the next big application. But what does that replication strategy really mean? Let's explore replication and consistency in Cassandra.
How are replicas chosen?
Where does node topology (location in a cluster) come into play?
What can I expect when nodes are down I'm querying with a Consistency Level of local quorum?
If a rack goes down can I still respond to quorum queries?
These questions may be simple to test, but have nuances that should be understood. This talk will dive into these topics in a visual and technical manner. Seasoned Cassandra veterans and new users alike stand to gain knowledge about these critical Cassandra components.
About the Speaker
Christopher Bradford Solutions Architect, DataStax
High performance drives Christopher Bradford. He has worked across various industries including the federal government, higher education, social news syndication, low latency HD video delivery and usability research. Chris combines application engineering principles and systems administration experience to design and implement performant systems. He has architected applications and systems to create highly available, fault tolerant, distributed services in a myriad environments.
This document provides a reference architecture for deploying Pivotal Cloud Foundry (PCF) on Dell EMC VxRail Appliances. PCF is a cloud platform that allows developers to deploy and scale applications with no downtime. VxRail Appliances integrate compute and storage on a single appliance in a hyper-converged infrastructure. This solution provides developers with a ready-to-use environment to build and deploy cloud native applications, while leveraging existing VMware infrastructure investments. The document describes the key technologies involved, including PCF, VxRail, and VMware vSphere and vSAN. It also provides guidance on hardware and software requirements, management, storage configuration, networking, and sizing considerations for this
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
The slides discuss various matters of the No SQL and casandra Models, the slide gives a complete picture of the both topics and its relations. Also it discuss the merits and demerits of the topics and its features and examples are also described.
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
This document provides an overview of the Cassandra NoSQL database. It begins with definitions of Cassandra and discusses its history and origins from projects like Bigtable and Dynamo. The document outlines Cassandra's architecture including its peer-to-peer distributed design, data partitioning, replication, and use of gossip protocols for cluster management. It provides examples of key features like tunable consistency levels and flexible schema design. Finally, it discusses companies that use Cassandra like Facebook and provides performance comparisons with MySQL.
Agenda
- What is NOSQL?
- Motivations for NOSQL?
- Brewer’s CAP Theorem
- Taxonomy of NOSQL databases
- Apache Cassandra
- Features
- Data Model
- Consistency
- Operations
- Cluster Membership
- What Does NOSQL means for RDBMS?
The document provides an overview of Apache Cassandra's architecture and design. It was created to address the needs of building reliable, high-performing, and always-available distributed databases. Cassandra is based on Dynamo and BigTable and uses a distributed hashing technique to partition and replicate data across nodes. It supports configurable replication across multiple data centers for high availability. Writes are sent to the local node and replicated to other nodes based on consistency level, while reads can be served from any replica.
Cassandra is a distributed, column-oriented database that scales horizontally and is optimized for writes. It uses consistent hashing to distribute data across nodes and achieve high availability even when nodes join or leave the cluster. Cassandra offers flexible consistency options and tunable replication to balance availability and durability for read and write operations across the distributed database.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
This document introduces HBase, an open-source, non-relational, distributed database modeled after Google's BigTable. It describes what HBase is, how it can be used, and when it is applicable. Key points include that HBase stores data in columns and rows accessed by row keys, integrates with Hadoop for MapReduce jobs, and is well-suited for large datasets, fast random access, and write-heavy applications. Common use cases involve log analytics, real-time analytics, and messages-centered systems.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It aims for high availability with no single points of failure. Data is distributed across nodes and replicated for redundancy. Cassandra uses a decentralized design with peer-to-peer communication and an eventually consistent model. It requires denormalized data models and queries to be defined prior to data structure.
NewSQL databases seek to provide the same scalable performance as NoSQL databases for online transaction processing workloads, while still maintaining the ACID guarantees of a traditional SQL database. NewSQL databases use new architectures like multi-version concurrency control and partition-level locking to allow for horizontal scaling and high availability without sacrificing consistency. They also provide highly optimized SQL engines to query data in a distributed environment.
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
Why do queries run out of memory? How can I make my queries even faster? How should I size ClickHouse nodes for best cost-efficiency? The key to these questions and many others is knowing what happens inside ClickHouse when a query runs. This webinar is a gentle introduction to ClickHouse internals, focusing on topics that will help your applications run faster and more efficiently. We’ll discuss the basic flow of query execution, dig into how ClickHouse handles aggregation and joins, and show you how ClickHouse distributes processing within a single CPU as well as across many nodes in the network. After attending this webinar you’ll understand how to open up the black box and see what the parts are doing.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
MaxScale uses an asynchronous and multi-threaded architecture to route client queries to backend database servers. Each thread creates its own epoll instance to monitor file descriptors for I/O events, avoiding locking between threads. Listening sockets are added to a global epoll file descriptor that notifies threads when clients connect, allowing connections to be distributed evenly across threads. This architecture improves performance over the previous single epoll instance approach.
Presto on Apache Spark: A Tale of Two Computation EnginesDatabricks
The architectural tradeoffs between the map/reduce paradigm and parallel databases has been a long and open discussion since the dawn of MapReduce over more than a decade ago. At Facebook, we have spent the past several years in independently building and scaling both Presto and Spark to Facebook scale batch workloads, and it is now increasingly evident that there is significant value in coupling Presto’s state-of-art low-latency evaluation with Spark’s robust and fault tolerant execution engine.
Apache Spark is an open-source cluster computing framework for large-scale data processing. It supports batch processing, real-time processing, streaming analytics, machine learning, interactive queries, and graph processing. Spark core provides distributed task dispatching and scheduling. It works by having a driver program that connects to a cluster manager to run tasks on executors in worker nodes. Spark also introduces Resilient Distributed Datasets (RDDs) that allow immutable, parallel data processing. Common RDD transformations include map, flatMap, groupByKey, and reduceByKey while common actions include reduce.
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
Many users set the replication strategy on their keyspaces to NetworkTopologyStrategy and move on with modeling their data or developing the next big application. But what does that replication strategy really mean? Let's explore replication and consistency in Cassandra.
How are replicas chosen?
Where does node topology (location in a cluster) come into play?
What can I expect when nodes are down I'm querying with a Consistency Level of local quorum?
If a rack goes down can I still respond to quorum queries?
These questions may be simple to test, but have nuances that should be understood. This talk will dive into these topics in a visual and technical manner. Seasoned Cassandra veterans and new users alike stand to gain knowledge about these critical Cassandra components.
About the Speaker
Christopher Bradford Solutions Architect, DataStax
High performance drives Christopher Bradford. He has worked across various industries including the federal government, higher education, social news syndication, low latency HD video delivery and usability research. Chris combines application engineering principles and systems administration experience to design and implement performant systems. He has architected applications and systems to create highly available, fault tolerant, distributed services in a myriad environments.
This document provides a reference architecture for deploying Pivotal Cloud Foundry (PCF) on Dell EMC VxRail Appliances. PCF is a cloud platform that allows developers to deploy and scale applications with no downtime. VxRail Appliances integrate compute and storage on a single appliance in a hyper-converged infrastructure. This solution provides developers with a ready-to-use environment to build and deploy cloud native applications, while leveraging existing VMware infrastructure investments. The document describes the key technologies involved, including PCF, VxRail, and VMware vSphere and vSAN. It also provides guidance on hardware and software requirements, management, storage configuration, networking, and sizing considerations for this
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
The slides discuss various matters of the No SQL and casandra Models, the slide gives a complete picture of the both topics and its relations. Also it discuss the merits and demerits of the topics and its features and examples are also described.
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
NoSQL databases were developed to address the limitations of relational databases in handling massive, unstructured datasets. NoSQL databases sacrifice ACID properties like consistency in favor of scalability and availability. The CAP theorem states that only two of consistency, availability, and partition tolerance can be achieved at once. Common NoSQL database types include document stores, key-value stores, column-oriented stores, and graph databases. NoSQL is best suited for large datasets that don't require strict consistency or relational structures.
Apache Cassandra is a free and open source distributed database management system that is highly scalable and designed to manage large amounts of structured data. It provides high availability with no single point of failure. Cassandra uses a decentralized architecture and is optimized for scalability and availability without compromising performance. It distributes data across nodes and data centers and replicates data for fault tolerance.
Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across commodity servers with no single point of failure. It provides high availability and scales linearly as nodes are added. Cassandra uses a flexible column-oriented data model and supports dynamic schemas. Data is replicated across nodes for fault tolerance, with Cassandra ensuring eventual consistency.
This document provides an overview of NoSQL databases and summarizes key information about several NoSQL databases, including HBase, Redis, Cassandra, MongoDB, and Memcached. It discusses concepts like horizontal scalability, the CAP theorem, eventual consistency, and data models used by different NoSQL databases like key-value, document, columnar, and graph structures.
Cassandra is a distributed database designed to handle large amounts of structured data across commodity servers. It provides linear scalability, fault tolerance, and high availability. Cassandra's architecture is masterless with all nodes equal, allowing it to scale out easily. Data is replicated across multiple nodes according to the replication strategy and factor for redundancy. Cassandra supports flexible and dynamic data modeling and tunable consistency levels. It is commonly used for applications requiring high throughput and availability, such as social media, IoT, and retail.
The document provides an overview of column databases. It begins with a quick recap of different database types and then defines and discusses column databases and column-oriented databases. It explains that column databases store data by column rather than by row, allowing for faster access to specific columns of data. Examples of column databases discussed include Cassandra, HBase, and Vertica. The document then focuses on Cassandra, describing its data model using concepts like keyspaces and column families. It also explains Cassandra's database engine architecture featuring memtables, SSTables, and compaction. The document concludes by mentioning some large companies that use Cassandra in production systems.
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
The document discusses big data storage concepts including cluster computing, distributed file systems, and different database types. It covers cluster structures like symmetric and asymmetric, distribution models like sharding and replication, and database types like relational, non-relational and NewSQL. Sharding partitions large datasets across multiple machines while replication stores duplicate copies of data to improve fault tolerance. Distributed file systems allow clients to access files stored across cluster nodes. Relational databases are schema-based while non-relational databases like NoSQL are schema-less and scale horizontally.
Cassandra is a highly scalable, distributed NoSQL database that is designed to handle large amounts of data across commodity servers while providing high availability without single points of failure. It uses a peer-to-peer distributed system where each node acts as both a client and server, allowing it to remain operational as long as one node remains active. Cassandra's data model consists of keyspaces that contain tables with rows and columns. Data is replicated across multiple nodes for fault tolerance.
NoSQL databases provide flexible schemas, horizontal scalability, and eventual consistency. There are four main NoSQL data models: key-value, document, column family, and graph. Key-value databases store data as unstructured (key, value) pairs. Document databases store data as documents with a flexible schema. Column family databases organize data by columns within rows. Graph databases model data as nodes and relationships. Popular NoSQL databases include MongoDB, Cassandra, HBase, Redis, Neo4j, and Elasticsearch.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
This document provides an overview of NoSQL databases. It discusses that NoSQL databases are non-relational and do not follow the RDBMS principles. It describes some of the main types of NoSQL databases including document stores, key-value stores, column-oriented stores, and graph databases. It also discusses how NoSQL databases are designed for massive scalability and do not guarantee ACID properties, instead following a BASE model ofBasically Available, Soft state, and Eventually Consistent.
Thingyan is now a global treasure! See how people around the world are search...Pixellion
We explored how the world searches for 'Thingyan' and 'သင်္ကြန်' and this year, it’s extra special. Thingyan is now officially recognized as a World Intangible Cultural Heritage by UNESCO! Dive into the trends and celebrate with us!
This comprehensive Data Science course is designed to equip learners with the essential skills and knowledge required to analyze, interpret, and visualize complex data. Covering both theoretical concepts and practical applications, the course introduces tools and techniques used in the data science field, such as Python programming, data wrangling, statistical analysis, machine learning, and data visualization.
Telangana State, India’s newest state that was carved from the erstwhile state of Andhra
Pradesh in 2014 has launched the Water Grid Scheme named as ‘Mission Bhagiratha (MB)’
to seek a permanent and sustainable solution to the drinking water problem in the state. MB is
designed to provide potable drinking water to every household in their premises through
piped water supply (PWS) by 2018. The vision of the project is to ensure safe and sustainable
piped drinking water supply from surface water sources
computer organization and assembly language : its about types of programming language along with variable and array description..https://www.nfciet.edu.pk/
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
8. Objective
• Schema Free
• Easy Replication
• SimpleAPI
• Consistence
• Can Handle huge data
NoSQLDatabase
• simplicity of design,
• horizontal scaling
• finer control over availability
9. Relational Database vs. NoSQL
Relational Database
• Supports powerful query language
• It has a fixed schema
• Follows ACID (Atomicity,Consistency,
Isolation, and Durability)
• Supports transactions
NoSQL Database
• Supports very simple query language
• No Fixed Schema
• It is only “eventually consistent”
• Does not support transactions
10. Other NoSQL Database
• Apache HBase - HBase is an open source, non-relational,
distributed database modeled after Google’s BigTable and is
written in Java. It is developed as a part of Apache Hadoop
project and runs on top of HDFS, providing BigTable-like
capabilities for Hadoop.
• MongoDB - MongoDB is a cross-platform document-
oriented database system that avoids using the traditional
table-based relational database structure in favor of JSON-
like documents with dynamic schemas making the
integration of data in certain types of applications easier and
faster.
11. What is Apache Cassandra?
• Apache Cassandra™ is a free
• Distributed
• High performance
• Extremely scalable
• Fault tolerant (i.e. no single point of failure)
• post-relational database solution. Cassandra can serve as both real-time
data store (the “system of record”) for online/transactional applications, and
as a read intensive database for business intelligence systems.
12. Features of Cassandra
• Elastic scalability
• Always on architecture
• Fast linear-scale performance
• Flexible data storage
• Easy data distribution
• Transaction support
• Fast writes
13. History of Cassandra
• Cassandra was developed at Facebook for inbox search.
• It was open-sourced by Facebook in July 2008.
• Cassandra was accepted into Apache Incubator in March 2009.
• It was made an Apache top-level project since February 2010.
15. CAPTheorem
• Distributed System Can only provide two of
• Availability
• Consistency
• PartitionTolerance
• AKA BrewersTheorem
16. Cassandra AP
• Cassandra Prioritizes Availability and PartitionTolerance
• Consistency is not guaranteed
• Tradeoffs between latency and Consistency
17. Other Approaches -CP
• Eg. Hbase
• Implements Row locking for consistency
• HBase has master/slave & Single point of Failure
• No A
19. Architecture Overview
• Cassandra was designed with the understanding that
system/hardware failures can and do occur
• Peer-to-peer, distributed system
• All nodes the same
• Data partitioned among all nodes in the cluster
• Custom data replication to ensure fault tolerance
• Read/Write-anywhere design
20. Architecture Overview
• Each node communicates with each other through the
• Gossip protocol, which exchanges information across
the
• cluster every second
• A commit log is used on each node to capture write
• activity. Data durability is assured
• Data also written to an in-memory structure
(memtable)
• and then to disk once the memory structure is full (an
• SStable)
21. Architecture Overview
• The schema used in Cassandra is mirrored after
Google
• Bigtable. It is a row-oriented, column structure
• A keyspace is akin to a database in the RDBMS world
• A column family is similar to an RDBMS table but is
more
• flexible/dynamic
• A row in a column family is indexed by its key. Other
• columns may be indexed as well
22. Components of Cassandra
• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log −The commit log is a crash-recovery mechanism in Cassandra. Every write
operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After commit log, the data
will be written to the mem-table. Sometimes, for a single-column family, there will be
multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table when its contents
reach a threshold value.
• Bloom filter −These are nothing but quick, nondeterministic, algorithms for testing
whether an element is a member of a set. It is a special kind of cache. Bloom filters are
accessed after every query.
25. Partition Process
• Data is transparently portioned across the nodes
• Data sent to a node is hashed and sent to partition based on hash
• The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml
file
• Once a cluster in initialized with a partitioner option, it can not be changed without
reloading all of the data in the cluster
26. Partitioning Strategies
• Random Partitioning
• This is the default and recommended strategy.
• Partition data as evenly as possible across all nodes
• using an MD5 hash of every column family row key
• Ordered Partitioning
• Store column family row keys in sorted order across all nodes in the cluster.
• Sequential writes can cause hot spots
• More administrative overhead to load balance the cluster
• Uneven load balancing for multiple column families
27. Replication
• To ensure fault tolerance and no single point of failure, you can replicate one or more
copies of every row across nodes in the cluster
• Replication is controlled by the parameters replication factor and replication strategy of a
keyspace
• Replication factor controls how many copies of a row should be store in the cluster
• Replication strategy controls how the data being replicated.
28. Replication Strategies
• Simple Strategy
• Place the original row on a node determined by the partitioner.Additional replica rows
are placed on the new nodes clockwise in the ring.
• NetworkTopology Strategy
• Allow replication between different racks in a data center and or between multiple
data centers
• The original row is placed according the partitioner.Additional replica rows in the same
data center are then placed by walking the ring clockwise until a node in a different
rack from previous replica is found. If there is no such node, additional replicas will be
placed in the same rack.
Editor's Notes
#9: Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database. Let us first understand what a NoSQL database does.
NoSQLDatabase
A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
The primary objective of a NoSQL database is to have
simplicity of design,
horizontal scaling, and
finer control over availability.
NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.