SlideShare a Scribd company logo
Apache Cassandra
Overview
Road Map
• The Big Data Challenge
• The Cassandra Solution
• The CAPTheorem
• The Architecture of Cassandra
• The Data Partition and Replication
The Big Data Challenge
Data Proliferation
• Data Produce Growing Exponentially
• Sources of Data Growing
• Mobile Devices
• Cloud Services
• Historical Data
ThreeVs –Trends In Big Data
• Volume:
• Terabytess -> Zettabyts
• Velocity:
• Batch -> Streaming
• Variety :
• Structured -> Unstructured
RDBMS Scaling
• Traditional RDBMS : No Scale out
• Cassandra : Linear Scale out
The Cassandra Solution
Objective
• Schema Free
• Easy Replication
• SimpleAPI
• Consistence
• Can Handle huge data
NoSQLDatabase
• simplicity of design,
• horizontal scaling
• finer control over availability
Relational Database vs. NoSQL
Relational Database
• Supports powerful query language
• It has a fixed schema
• Follows ACID (Atomicity,Consistency,
Isolation, and Durability)
• Supports transactions
NoSQL Database
• Supports very simple query language
• No Fixed Schema
• It is only “eventually consistent”
• Does not support transactions
Other NoSQL Database
• Apache HBase - HBase is an open source, non-relational,
distributed database modeled after Google’s BigTable and is
written in Java. It is developed as a part of Apache Hadoop
project and runs on top of HDFS, providing BigTable-like
capabilities for Hadoop.
• MongoDB - MongoDB is a cross-platform document-
oriented database system that avoids using the traditional
table-based relational database structure in favor of JSON-
like documents with dynamic schemas making the
integration of data in certain types of applications easier and
faster.
What is Apache Cassandra?
• Apache Cassandra™ is a free
• Distributed
• High performance
• Extremely scalable
• Fault tolerant (i.e. no single point of failure)
• post-relational database solution. Cassandra can serve as both real-time
data store (the “system of record”) for online/transactional applications, and
as a read intensive database for business intelligence systems.
Features of Cassandra
• Elastic scalability
• Always on architecture
• Fast linear-scale performance
• Flexible data storage
• Easy data distribution
• Transaction support
• Fast writes
History of Cassandra
• Cassandra was developed at Facebook for inbox search.
• It was open-sourced by Facebook in July 2008.
• Cassandra was accepted into Apache Incubator in March 2009.
• It was made an Apache top-level project since February 2010.
The CapTheorem
CAPTheorem
• Distributed System Can only provide two of
• Availability
• Consistency
• PartitionTolerance
• AKA BrewersTheorem
Cassandra AP
• Cassandra Prioritizes Availability and PartitionTolerance
• Consistency is not guaranteed
• Tradeoffs between latency and Consistency
Other Approaches -CP
• Eg. Hbase
• Implements Row locking for consistency
• HBase has master/slave & Single point of Failure
• No A
The Architecture of Cassandra
Architecture Overview
• Cassandra was designed with the understanding that
system/hardware failures can and do occur
• Peer-to-peer, distributed system
• All nodes the same
• Data partitioned among all nodes in the cluster
• Custom data replication to ensure fault tolerance
• Read/Write-anywhere design
Architecture Overview
• Each node communicates with each other through the
• Gossip protocol, which exchanges information across
the
• cluster every second
• A commit log is used on each node to capture write
• activity. Data durability is assured
• Data also written to an in-memory structure
(memtable)
• and then to disk once the memory structure is full (an
• SStable)
Architecture Overview
• The schema used in Cassandra is mirrored after
Google
• Bigtable. It is a row-oriented, column structure
• A keyspace is akin to a database in the RDBMS world
• A column family is similar to an RDBMS table but is
more
• flexible/dynamic
• A row in a column family is indexed by its key. Other
• columns may be indexed as well
Components of Cassandra
• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log −The commit log is a crash-recovery mechanism in Cassandra. Every write
operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After commit log, the data
will be written to the mem-table. Sometimes, for a single-column family, there will be
multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table when its contents
reach a threshold value.
• Bloom filter −These are nothing but quick, nondeterministic, algorithms for testing
whether an element is a member of a set. It is a special kind of cache. Bloom filters are
accessed after every query.
Cassandra an overview
The Data Partition and Replication
Partition Process
• Data is transparently portioned across the nodes
• Data sent to a node is hashed and sent to partition based on hash
• The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml
file
• Once a cluster in initialized with a partitioner option, it can not be changed without
reloading all of the data in the cluster
Partitioning Strategies
• Random Partitioning
• This is the default and recommended strategy.
• Partition data as evenly as possible across all nodes
• using an MD5 hash of every column family row key
• Ordered Partitioning
• Store column family row keys in sorted order across all nodes in the cluster.
• Sequential writes can cause hot spots
• More administrative overhead to load balance the cluster
• Uneven load balancing for multiple column families
Replication
• To ensure fault tolerance and no single point of failure, you can replicate one or more
copies of every row across nodes in the cluster
• Replication is controlled by the parameters replication factor and replication strategy of a
keyspace
• Replication factor controls how many copies of a row should be store in the cluster
• Replication strategy controls how the data being replicated.
Replication Strategies
• Simple Strategy
• Place the original row on a node determined by the partitioner.Additional replica rows
are placed on the new nodes clockwise in the ring.
• NetworkTopology Strategy
• Allow replication between different racks in a data center and or between multiple
data centers
• The original row is placed according the partitioner.Additional replica rows in the same
data center are then placed by walking the ring clockwise until a node in a different
rack from previous replica is found. If there is no such node, additional replicas will be
placed in the same rack.
Ad

More Related Content

What's hot (20)

Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
nickmbailey
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
Nader Ganayem
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
nickmbailey
 
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
SindhuVasireddy1
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
Elvis Saravia
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Apache Spark.
Apache Spark.Apache Spark.
Apache Spark.
JananiJ19
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
DataStax
 
PCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteturePCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteture
Vuong Pham
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
nickmbailey
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
nickmbailey
 
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
SindhuVasireddy1
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
Elvis Saravia
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Apache Spark.
Apache Spark.Apache Spark.
Apache Spark.
JananiJ19
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
DataStax
 
PCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteturePCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteture
Vuong Pham
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 

Similar to Cassandra an overview (20)

The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
Ramakrishna kapa
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
NoSql
NoSqlNoSql
NoSql
Girish Khanzode
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Introduction to Data Science NoSQL.pptx
Introduction to Data Science  NoSQL.pptxIntroduction to Data Science  NoSQL.pptx
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
 
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice ExamplesUnit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, KeyspacesUnit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
Sanura Hettiarachchi
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
BRINDHA256909
 
No sql databases
No sql databasesNo sql databases
No sql databases
swathika rajan
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
Satish Mehta
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
RithikRaj25
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Introduction to Data Science NoSQL.pptx
Introduction to Data Science  NoSQL.pptxIntroduction to Data Science  NoSQL.pptx
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
 
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice ExamplesUnit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, KeyspacesUnit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
Satish Mehta
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Ad

Recently uploaded (20)

Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Ad

Cassandra an overview

  • 2. Road Map • The Big Data Challenge • The Cassandra Solution • The CAPTheorem • The Architecture of Cassandra • The Data Partition and Replication
  • 3. The Big Data Challenge
  • 4. Data Proliferation • Data Produce Growing Exponentially • Sources of Data Growing • Mobile Devices • Cloud Services • Historical Data
  • 5. ThreeVs –Trends In Big Data • Volume: • Terabytess -> Zettabyts • Velocity: • Batch -> Streaming • Variety : • Structured -> Unstructured
  • 6. RDBMS Scaling • Traditional RDBMS : No Scale out • Cassandra : Linear Scale out
  • 8. Objective • Schema Free • Easy Replication • SimpleAPI • Consistence • Can Handle huge data NoSQLDatabase • simplicity of design, • horizontal scaling • finer control over availability
  • 9. Relational Database vs. NoSQL Relational Database • Supports powerful query language • It has a fixed schema • Follows ACID (Atomicity,Consistency, Isolation, and Durability) • Supports transactions NoSQL Database • Supports very simple query language • No Fixed Schema • It is only “eventually consistent” • Does not support transactions
  • 10. Other NoSQL Database • Apache HBase - HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as a part of Apache Hadoop project and runs on top of HDFS, providing BigTable-like capabilities for Hadoop. • MongoDB - MongoDB is a cross-platform document- oriented database system that avoids using the traditional table-based relational database structure in favor of JSON- like documents with dynamic schemas making the integration of data in certain types of applications easier and faster.
  • 11. What is Apache Cassandra? • Apache Cassandra™ is a free • Distributed • High performance • Extremely scalable • Fault tolerant (i.e. no single point of failure) • post-relational database solution. Cassandra can serve as both real-time data store (the “system of record”) for online/transactional applications, and as a read intensive database for business intelligence systems.
  • 12. Features of Cassandra • Elastic scalability • Always on architecture • Fast linear-scale performance • Flexible data storage • Easy data distribution • Transaction support • Fast writes
  • 13. History of Cassandra • Cassandra was developed at Facebook for inbox search. • It was open-sourced by Facebook in July 2008. • Cassandra was accepted into Apache Incubator in March 2009. • It was made an Apache top-level project since February 2010.
  • 15. CAPTheorem • Distributed System Can only provide two of • Availability • Consistency • PartitionTolerance • AKA BrewersTheorem
  • 16. Cassandra AP • Cassandra Prioritizes Availability and PartitionTolerance • Consistency is not guaranteed • Tradeoffs between latency and Consistency
  • 17. Other Approaches -CP • Eg. Hbase • Implements Row locking for consistency • HBase has master/slave & Single point of Failure • No A
  • 18. The Architecture of Cassandra
  • 19. Architecture Overview • Cassandra was designed with the understanding that system/hardware failures can and do occur • Peer-to-peer, distributed system • All nodes the same • Data partitioned among all nodes in the cluster • Custom data replication to ensure fault tolerance • Read/Write-anywhere design
  • 20. Architecture Overview • Each node communicates with each other through the • Gossip protocol, which exchanges information across the • cluster every second • A commit log is used on each node to capture write • activity. Data durability is assured • Data also written to an in-memory structure (memtable) • and then to disk once the memory structure is full (an • SStable)
  • 21. Architecture Overview • The schema used in Cassandra is mirrored after Google • Bigtable. It is a row-oriented, column structure • A keyspace is akin to a database in the RDBMS world • A column family is similar to an RDBMS table but is more • flexible/dynamic • A row in a column family is indexed by its key. Other • columns may be indexed as well
  • 22. Components of Cassandra • Node − It is the place where data is stored. • Data center − It is a collection of related nodes. • Cluster − A cluster is a component that contains one or more data centers. • Commit log −The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. • Bloom filter −These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 24. The Data Partition and Replication
  • 25. Partition Process • Data is transparently portioned across the nodes • Data sent to a node is hashed and sent to partition based on hash • The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml file • Once a cluster in initialized with a partitioner option, it can not be changed without reloading all of the data in the cluster
  • 26. Partitioning Strategies • Random Partitioning • This is the default and recommended strategy. • Partition data as evenly as possible across all nodes • using an MD5 hash of every column family row key • Ordered Partitioning • Store column family row keys in sorted order across all nodes in the cluster. • Sequential writes can cause hot spots • More administrative overhead to load balance the cluster • Uneven load balancing for multiple column families
  • 27. Replication • To ensure fault tolerance and no single point of failure, you can replicate one or more copies of every row across nodes in the cluster • Replication is controlled by the parameters replication factor and replication strategy of a keyspace • Replication factor controls how many copies of a row should be store in the cluster • Replication strategy controls how the data being replicated.
  • 28. Replication Strategies • Simple Strategy • Place the original row on a node determined by the partitioner.Additional replica rows are placed on the new nodes clockwise in the ring. • NetworkTopology Strategy • Allow replication between different racks in a data center and or between multiple data centers • The original row is placed according the partitioner.Additional replica rows in the same data center are then placed by walking the ring clockwise until a node in a different rack from previous replica is found. If there is no such node, additional replicas will be placed in the same rack.

Editor's Notes

  • #9: Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database. Let us first understand what a NoSQL database does. NoSQLDatabase A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data. The primary objective of a NoSQL database is to have simplicity of design, horizontal scaling, and finer control over availability. NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.