SlideShare a Scribd company logo
Cloud
Computing
Apache Cassandra
Omid mirabdolazimi
Fardin jamshidi
Professor : Dr Sadegh dorri nogoorani
1397/10/09
• Open source distributed database management
system for handling huge amounts of data across
many commodity systems
• Cassandra is a “NoSQL” or “Non-Relational”
database and can be described as a mix between a
“Key-value Store” and a “Column-Orientated”
database
What Is Apache Cassandra?
Where Did Cassandra Come From?
• Cassandra was initially created at Facebook
• Combination of Google Big Table and Amazon Dynamo
• It was created to power the “Inbox Search” feature
• Cassandra was released as open source in July of 2008
• It became an Apache Incubator project in February of
2009 and It became a full level project a year after that
Cassandra Architecture
• Built with the understanding that hardware & software
failures can happen
• Peer to Peer Architecture
• All nodes are the same
• Read/Write Anywhere
• Gossip Protocol
• Commit Log Captures All Activity
• Well suited for cloud deployments
Features of Cassandra
• Decentralized – No master & no single point of failure. Data is
distributed across the cluster
• Replication – Tailored for multiple-data center deployment
• Scalability – New machines can easily be added with no
downtime or interruption
• Fault Tolerance – Failed nodes can be replaced with no
downtime
• Cassandra Query Language (CQL) – An SQL-like alternative
More Advantages
• Always On Architecture – Continuous availability
with no downtime
• Faster linear-scale performance
• Operational Simplicity – Administration is
simplified
• Transaction Support
• No new equipment required - Very economical
CQL (Cassandra Query Language)
• CQL is very similar to SQL (Structured Query
Language) in terms of syntax and commands
• Statements directly change data and/or change the
way data is stored
• All statements end with a semi-colon
SELECT * FROM sampletable;
Who Uses Cassandra?
• Facebook
• WalmartLabs
• Constant Contact
• Digg
• AppScale
• Netflix
• Twitter
• Zoho
• IBM
• FormSpring
• Cisco WebEx
• Rackspace
• OpenX
• Adobe
• Comcast
• eBay
Apache Cassandra
Architecture & Data Model
Cloud
Computing
The Data Model
• Cassandra is sort of in its own data model class but
can be described as a hybrid of a “Key-value Store”
and a “Column-Orientated” database.
• Cassandra was modeled after Google’s “Big Table”
and Amazon’s “Dynamo”
• Cassandra even has similarities to a “Relational
Database” , but much more flexible.
Keyspaces
• A Keyspace is a “container” for your data
• Similar to a Database in an RDBMS
• Used to group Column Families together
• Typically, a Cluster has one Keyspace per application
• Replication is controlled on a per-keyspace basis
Column Family
• A “Column Family” is similar to a “Table” in a RDBMS
because it has columns and rows
• Relational database tables use a predefined, fixed
schema. Column families do not which makes them
very flexible
• Cassandra’s data model promotes “Denormalization”
which is the complete opposite of the relational
database
Example
Replication Strategies
• Simple Strategy – Use this for a single data center. It
places the first replica on a node determined by the
partitioner. Does not consider topology.
• NetworkTopologyStrategy - If you plan to have your
cluster span across multiple data centers. Specifies
how many replicas you want in each data center.
Apache Cassandra introduction
Apache Cassandra introduction
Apache Cassandra
Relational Database Comparison
Cloud
Computing
Relational Databases
Relational databases have been around for 40+ years and
will always have a place in technology. There are many
situations where a relational database is the best choice
NoSQL databases are NOT going to replace relational
databases in all areas
Advantages of NoSQL
• Handles Big Data Much Better Than RDBMS
• Cheaper Hardware & Software
• Easier Scaling
• Much More Flexible (Relatively Schema-Free)
• Map and Reduce Capability
• Less Management
Advantages of RDBMS
• Better For Complex Data
• Better Support
• Better For “Relational” or “Object
Orientated” Data
• Been Around Longer
• Better Data Analytics
Scale-Up vs. Scale-Out
• Scale-Up Architecture (Relational) – Storage is confined to a
single form-factor which needs more resources to scale (CPU,
Memory, etc) to a single node.
• Scale-Out Architecture (NoSQL) - The total amount of disk
space can be easily expanded as needed. When a storage
array reaches it’s max, another will pick it up where the last
left off. This makes scaling significantly easier and less
expensive.
Things To Consider
When choosing a system/database for your projects or company,
there are many things to factor in…
Complexity
Type of Data
Budget
Amount of Data
Traffic
Programming Languages
Apache Cassandra
How It Works
Cloud
Computing
The “Write” Process
• Logs data to the commit log
• Writes data to the memtable
• Flushes data from the memtable
• Stores data on disk in SSTables
• Compaction
The “Write” Process
Deleting Data
• Cassandra does not delete as SSTables are
immutable
• Cassandra marks the data with a
“Tombstone” which is a marker in a row that
indicates that a column was deleted
The “Read” Process
• Reading data is done in parallel across all
nodes in a cluster
• If the node with the requested data is down
then the data will be read from the node
which holds a replica of the data
Apache Cassandra
Cassandra Software
Cloud
Computing
Compatible Operating Systems
• GNU/Linux
• Microsoft Windows
• Mac OSX
Prerequisites & Requirements
• Requires the most stable version of Java 7
• 4GB+ memory is recommended for
production environment
Cassandra Server
• Apache Cassandra Core Server
• Nodetool Admin Command Line Interface
• CQLSH and Cassandra-cli Development Shell
Apache provides binary tarballs and Debian packages -
http://cassandra.apache.org/download/
DataStax Community Distribution
DataStax Community Edition is a free software package that offers…
• Apache Cassandra Core Server
• Nodetool Admin Command Line Interface
• CQLSH and Cassandra-cli Development Shell
• Windows Installer & Mac OS X Binary
• OpsCenter Community Version
• Sample Database & Application
• CQL Utility
DataStax Enterprise Distribution
DataStax Enterprise Edition is a premium software package that offers…
• All that Community Offers
• OpsCenter Enterprise Version
• Apache Hadoop (With MapReduce, Hive & Pig)
• Enterprise Search
• Premium Support
• Advanced Security Features
• Workload Management Benefits
OpsCenter
• Simplified Data Management
• Easy To Use Visual Interface
• Centralized Dashboard
• Easy Installation
• Real-Time Analytics
• Multiple Data Center Support
• Automated Management
• Rebalance & Repair Clusters
• Alerts & Notifications
Available Client Drivers
There are client drivers available for the following languages…
Go
Node.js
Clojure
C++
Java
Python
Ruby
C# / .NET
Apache Cassandra
CQL Overview
Cloud
Computing
Defining Keyspaces in CQL
CREATE KEYSPACE people
WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 3 };
CREATE KEYSPACE people
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy',
'dc1' : 3, 'dc2' : 2};
Compound Keys
A compound primary key includes the partition key, which
determines on which node data is stored, and one or more
additional columns that determine clustering.
You need to know which fields you want to be able to sort and
order by before you create the data model.
Creating Compound Keys
To create a compound primary key, use the keywords, PRIMARY KEY,
followed by the comma-separated list of column names enclosed in
parentheses.
CREATE TABLE emp (
empID int,
deptID int,
first_name varchar,
last_name varchar,
PRIMARY KEY (empID, deptID)
);
UUID’s
Universal Unique ID is a field type that is used to avoid
collisions in column names.
32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -,
after the 8th, 12th, 16th, and 20th digits. For example: 01234567-0123-
0123-0123-0123456789ab
TIMEUUID’s
Time Universal Unique ID is a field type that is used to avoid
collisions in column names.
timeuuidUses the time in 100 nanosecond intervals since 00:00:00.00 UTC
(60 bits), a clock sequence number for prevention of duplicates (14 bits),
plus the IEEE 801 MAC address (48 bits) to generate a unique identifier.
For example: d2177dd0-eaa2-11de-a572-001b779c76e3
Some functions that you can use to insert timeUUIDs are “now()”,
“dateOf()” and “minTimeuuid()/maxTimeuuid”
Selecting Data
SELECT * FROM users;
SELECT COUNT(*) FROM users;
SELECT * FROM users LIMIT 25;
SELECT * FROM users WHERE city = ‘Boston’ (city must be defined as a primary
key or index)
CREATE TABLE users (
id uuid,
first_name varchar,
last_name varchar,
age int,
city varchar,
PRIMARY KEY (id, city, age)
);
Inserting Data
INSERT INTO users (id, first_name, last_name, age, city)
VALUES (now(), ‘John’, ‘Doe’, ‘33’, ‘Seattle’)
INSERT INTO users (id, first_name, last_name, emails)
VALUES('frodo', 'Frodo', 'Baggins', {'f@baggins.com',
'baggins@gmail.com'});
NOTES:
• If column exists, it is updated
• You can qualify table names by keyspace
Updating Data
UPDATE users SET age = 34 WHERE id = cfd66ccc-d857-
4e90-b1e5-df98a3d40cd6
UPDATE users SET age = 34, city = ‘Portland’ WHERE id
= cfd66ccc-d857-4e90-b1e5-df98a3d40cd6
UPDATE users
SET todo =
{ '2012-9-24' : 'enter mordor',
'2012-10-2 12:00' : 'throw ring into mount doom' }
WHERE user_id = 'frodo';
Deleting Data
DELETE email, phone
FROM users
WHERE user_name = 'jsmith';
DELETE todo ['2012-9-24'] FROM users WHERE id =
'frodo';
Altering Tables
# Change Type
ALTER TABLE users ALTER age TYPE int
# Add Column
ALTER TABLE users ADD state varchar;
# Drop Column
ALTER TABLE users DROP city
Thanks for your time!
Ad

More Related Content

What's hot (20)

Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
ElifTech
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
Partha Das
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
Tayfun Sevimli
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
DataStax Academy
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
DataStax Academy
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
Rutuja Gholap
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
Michelle Darling
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Edureka!
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
Edureka!
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
ElifTech
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
DataStax Academy
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
DataStax Academy
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Edureka!
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
Edureka!
 

Similar to Apache Cassandra introduction (20)

Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice ExamplesUnit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, KeyspacesUnit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
betalab
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
SergioBruno21
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
 
NoSQL
NoSQLNoSQL
NoSQL
dbulic
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
Ramakrishna kapa
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
IT Infrastructure for Beginners Understanding
IT Infrastructure for Beginners UnderstandingIT Infrastructure for Beginners Understanding
IT Infrastructure for Beginners Understanding
raj4u1oct
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Cassandra
Cassandra Cassandra
Cassandra
Pooja GV
 
Revision
RevisionRevision
Revision
David Sherlock
 
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice ExamplesUnit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types,  CQLSH, KeyspacesUnit -3 -Features of Cassandra, CQL Data types,  CQLSH, Keyspaces
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
betalab
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
SergioBruno21
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
IT Infrastructure for Beginners Understanding
IT Infrastructure for Beginners UnderstandingIT Infrastructure for Beginners Understanding
IT Infrastructure for Beginners Understanding
raj4u1oct
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Cassandra
Cassandra Cassandra
Cassandra
Pooja GV
 
Ad

Recently uploaded (20)

Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Ad

Apache Cassandra introduction

  • 1. Cloud Computing Apache Cassandra Omid mirabdolazimi Fardin jamshidi Professor : Dr Sadegh dorri nogoorani 1397/10/09
  • 2. • Open source distributed database management system for handling huge amounts of data across many commodity systems • Cassandra is a “NoSQL” or “Non-Relational” database and can be described as a mix between a “Key-value Store” and a “Column-Orientated” database What Is Apache Cassandra?
  • 3. Where Did Cassandra Come From? • Cassandra was initially created at Facebook • Combination of Google Big Table and Amazon Dynamo • It was created to power the “Inbox Search” feature • Cassandra was released as open source in July of 2008 • It became an Apache Incubator project in February of 2009 and It became a full level project a year after that
  • 4. Cassandra Architecture • Built with the understanding that hardware & software failures can happen • Peer to Peer Architecture • All nodes are the same • Read/Write Anywhere • Gossip Protocol • Commit Log Captures All Activity • Well suited for cloud deployments
  • 5. Features of Cassandra • Decentralized – No master & no single point of failure. Data is distributed across the cluster • Replication – Tailored for multiple-data center deployment • Scalability – New machines can easily be added with no downtime or interruption • Fault Tolerance – Failed nodes can be replaced with no downtime • Cassandra Query Language (CQL) – An SQL-like alternative
  • 6. More Advantages • Always On Architecture – Continuous availability with no downtime • Faster linear-scale performance • Operational Simplicity – Administration is simplified • Transaction Support • No new equipment required - Very economical
  • 7. CQL (Cassandra Query Language) • CQL is very similar to SQL (Structured Query Language) in terms of syntax and commands • Statements directly change data and/or change the way data is stored • All statements end with a semi-colon SELECT * FROM sampletable;
  • 8. Who Uses Cassandra? • Facebook • WalmartLabs • Constant Contact • Digg • AppScale • Netflix • Twitter • Zoho • IBM • FormSpring • Cisco WebEx • Rackspace • OpenX • Adobe • Comcast • eBay
  • 9. Apache Cassandra Architecture & Data Model Cloud Computing
  • 10. The Data Model • Cassandra is sort of in its own data model class but can be described as a hybrid of a “Key-value Store” and a “Column-Orientated” database. • Cassandra was modeled after Google’s “Big Table” and Amazon’s “Dynamo” • Cassandra even has similarities to a “Relational Database” , but much more flexible.
  • 11. Keyspaces • A Keyspace is a “container” for your data • Similar to a Database in an RDBMS • Used to group Column Families together • Typically, a Cluster has one Keyspace per application • Replication is controlled on a per-keyspace basis
  • 12. Column Family • A “Column Family” is similar to a “Table” in a RDBMS because it has columns and rows • Relational database tables use a predefined, fixed schema. Column families do not which makes them very flexible • Cassandra’s data model promotes “Denormalization” which is the complete opposite of the relational database
  • 14. Replication Strategies • Simple Strategy – Use this for a single data center. It places the first replica on a node determined by the partitioner. Does not consider topology. • NetworkTopologyStrategy - If you plan to have your cluster span across multiple data centers. Specifies how many replicas you want in each data center.
  • 17. Apache Cassandra Relational Database Comparison Cloud Computing
  • 18. Relational Databases Relational databases have been around for 40+ years and will always have a place in technology. There are many situations where a relational database is the best choice NoSQL databases are NOT going to replace relational databases in all areas
  • 19. Advantages of NoSQL • Handles Big Data Much Better Than RDBMS • Cheaper Hardware & Software • Easier Scaling • Much More Flexible (Relatively Schema-Free) • Map and Reduce Capability • Less Management
  • 20. Advantages of RDBMS • Better For Complex Data • Better Support • Better For “Relational” or “Object Orientated” Data • Been Around Longer • Better Data Analytics
  • 21. Scale-Up vs. Scale-Out • Scale-Up Architecture (Relational) – Storage is confined to a single form-factor which needs more resources to scale (CPU, Memory, etc) to a single node. • Scale-Out Architecture (NoSQL) - The total amount of disk space can be easily expanded as needed. When a storage array reaches it’s max, another will pick it up where the last left off. This makes scaling significantly easier and less expensive.
  • 22. Things To Consider When choosing a system/database for your projects or company, there are many things to factor in… Complexity Type of Data Budget Amount of Data Traffic Programming Languages
  • 23. Apache Cassandra How It Works Cloud Computing
  • 24. The “Write” Process • Logs data to the commit log • Writes data to the memtable • Flushes data from the memtable • Stores data on disk in SSTables • Compaction
  • 26. Deleting Data • Cassandra does not delete as SSTables are immutable • Cassandra marks the data with a “Tombstone” which is a marker in a row that indicates that a column was deleted
  • 27. The “Read” Process • Reading data is done in parallel across all nodes in a cluster • If the node with the requested data is down then the data will be read from the node which holds a replica of the data
  • 29. Compatible Operating Systems • GNU/Linux • Microsoft Windows • Mac OSX
  • 30. Prerequisites & Requirements • Requires the most stable version of Java 7 • 4GB+ memory is recommended for production environment
  • 31. Cassandra Server • Apache Cassandra Core Server • Nodetool Admin Command Line Interface • CQLSH and Cassandra-cli Development Shell Apache provides binary tarballs and Debian packages - http://cassandra.apache.org/download/
  • 32. DataStax Community Distribution DataStax Community Edition is a free software package that offers… • Apache Cassandra Core Server • Nodetool Admin Command Line Interface • CQLSH and Cassandra-cli Development Shell • Windows Installer & Mac OS X Binary • OpsCenter Community Version • Sample Database & Application • CQL Utility
  • 33. DataStax Enterprise Distribution DataStax Enterprise Edition is a premium software package that offers… • All that Community Offers • OpsCenter Enterprise Version • Apache Hadoop (With MapReduce, Hive & Pig) • Enterprise Search • Premium Support • Advanced Security Features • Workload Management Benefits
  • 34. OpsCenter • Simplified Data Management • Easy To Use Visual Interface • Centralized Dashboard • Easy Installation • Real-Time Analytics • Multiple Data Center Support • Automated Management • Rebalance & Repair Clusters • Alerts & Notifications
  • 35. Available Client Drivers There are client drivers available for the following languages… Go Node.js Clojure C++ Java Python Ruby C# / .NET
  • 37. Defining Keyspaces in CQL CREATE KEYSPACE people WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; CREATE KEYSPACE people WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
  • 38. Compound Keys A compound primary key includes the partition key, which determines on which node data is stored, and one or more additional columns that determine clustering. You need to know which fields you want to be able to sort and order by before you create the data model.
  • 39. Creating Compound Keys To create a compound primary key, use the keywords, PRIMARY KEY, followed by the comma-separated list of column names enclosed in parentheses. CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar, PRIMARY KEY (empID, deptID) );
  • 40. UUID’s Universal Unique ID is a field type that is used to avoid collisions in column names. 32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -, after the 8th, 12th, 16th, and 20th digits. For example: 01234567-0123- 0123-0123-0123456789ab
  • 41. TIMEUUID’s Time Universal Unique ID is a field type that is used to avoid collisions in column names. timeuuidUses the time in 100 nanosecond intervals since 00:00:00.00 UTC (60 bits), a clock sequence number for prevention of duplicates (14 bits), plus the IEEE 801 MAC address (48 bits) to generate a unique identifier. For example: d2177dd0-eaa2-11de-a572-001b779c76e3 Some functions that you can use to insert timeUUIDs are “now()”, “dateOf()” and “minTimeuuid()/maxTimeuuid”
  • 42. Selecting Data SELECT * FROM users; SELECT COUNT(*) FROM users; SELECT * FROM users LIMIT 25; SELECT * FROM users WHERE city = ‘Boston’ (city must be defined as a primary key or index) CREATE TABLE users ( id uuid, first_name varchar, last_name varchar, age int, city varchar, PRIMARY KEY (id, city, age) );
  • 43. Inserting Data INSERT INTO users (id, first_name, last_name, age, city) VALUES (now(), ‘John’, ‘Doe’, ‘33’, ‘Seattle’) INSERT INTO users (id, first_name, last_name, emails) VALUES('frodo', 'Frodo', 'Baggins', {'[email protected]', '[email protected]'}); NOTES: • If column exists, it is updated • You can qualify table names by keyspace
  • 44. Updating Data UPDATE users SET age = 34 WHERE id = cfd66ccc-d857- 4e90-b1e5-df98a3d40cd6 UPDATE users SET age = 34, city = ‘Portland’ WHERE id = cfd66ccc-d857-4e90-b1e5-df98a3d40cd6 UPDATE users SET todo = { '2012-9-24' : 'enter mordor', '2012-10-2 12:00' : 'throw ring into mount doom' } WHERE user_id = 'frodo';
  • 45. Deleting Data DELETE email, phone FROM users WHERE user_name = 'jsmith'; DELETE todo ['2012-9-24'] FROM users WHERE id = 'frodo';
  • 46. Altering Tables # Change Type ALTER TABLE users ALTER age TYPE int # Add Column ALTER TABLE users ADD state varchar; # Drop Column ALTER TABLE users DROP city