Pratham SQL 5
Pratham SQL 5
PRACTICAL NO: 05
Prerequisite Theory
In Cassandra, the management of keyspaces involves creating, altering, and dropping. Here
are examples of how you can perform these actions using the Cassandra Query Language
(CQL):
Create Keyspace:
Alter Keyspace:
To alter an existing keyspace, use the `ALTER KEYSPACE` statement. This can
include modifying the replication strategy, replication factor, or other configuration
options.
Drop Keyspace:
To drop (delete) a keyspace and all its data, use the `DROP KEYSPACE` statement. Be
cautious, as this operation is irreversible.
These statements provide the basic syntax for creating, altering, and dropping keyspaces in
Cassandra. Customize the keyspace names, replication strategies, and other parameters
based on your specific use case and requirements.
In Cassandra, you can manage tables using the Cassandra Query Language (CQL).
Below are examples of creating, altering, and dropping tables:
Create Table:
To create a table, use the `CREATE TABLE` statement. Replace 'your_table' with the
desired table name and specify the columns along with their data types.
Drop Table:
To drop (delete) a table, use the `DROP TABLE` statement. Be cautious, as this
operation permanently removes the table and its data.
These statements provide the basic syntax for creating, altering, and dropping tables in
Cassandra. Customize the table names, column definitions, and other parameters based on
your specific use case and requirements.
In Cassandra, the `TRUNCATE` statement is used to remove all data from a table while
keeping the table structure intact. This operation is similar to deleting all rows from the
table but more efficient, as it does not involve the same overhead.
Truncate Table:
TRUNCATE your_table;
Replace 'your_table' with the name of the table you want to truncate. This statement
will remove all rows from the specified table, but the table structure, including column
definitions and indexes, will remain unchanged.
In Cassandra, you can create and drop indexes using the Cassandra Query Language
(CQL). Below are examples of creating and dropping an index on a table:
Create Index:
To create an index on a column, use the `CREATE INDEX` statement. This allows you
to create an index to improve the performance of queries based on that column.
Replace 'your_index_name' with the desired name for the index, 'your_table' with
the table name, and 'your_column' with the column on which you want to create the index.
Drop Index:
To drop (delete) an index, use the `DROP INDEX` statement. This removes the specified
index from the table.
Replace 'your_index_name' with the name of the index you want to drop. This
statement removes the index from the table but does not affect the table structure or data.
Ensure that you carefully consider the implications of adding and removing indexes,
as it can impact the performance and storage requirements of your Cassandra database.
BEGIN BATCH
APPLY BATCH;
BEGIN BATCH
INSERT INTO your_table (id, name, age) VALUES (uuid(), 'pratham', 18);
APPLY BATCH;
CRUD OPERATIONS
In Cassandra, you can perform basic data manipulation operations such as insert, select,
update, and delete using the Cassandra Query Language (CQL). Here are examples of each
operation:
Insert:
INSERT INTO your_table (id, name, age, email) VALUES (uuid(), 'mayur', 18,
'[email protected]');
Replace 'your_table' with the name of your table and adjust the values accordingly.
Select:
Replace 'your_table' with the table name and `<some_id>` with the specific identifier.
Update:
Replace 'your_table' with the table name, `<some_id>` with the specific identifier, and
adjust the values accordingly.
Delete:
Replace 'your_table' with the table name and `<some_id>` with the specific identifier.
CASSANDRA COLLECTIONS
Cassandra collections are used to handle tasks. You can store multiple elements in
collection.
In Cassandra, `SET`, `LIST`, and `MAP` are collection types that allow you to store multiple
values within a single column. These collections can be useful when you need to handle
scenarios where a column contains multiple items. Here's a brief overview of each:
SET:
A `SET` is an unordered collection of unique elements. Each element in the set must
be of the same data type. Duplicate values are not allowed.
Example:
tags SET<TEXT>
);
INSERT INTO example_set (id, tags) VALUES (uuid(), {'tag1', 'tag2', 'tag3'});
LIST:
Example:
comments LIST<TEXT>
);
INSERT INTO example_list (id, comments) VALUES (uuid(), ['Comment 1', 'Comment
2', 'Comment 3']);
MAP:
Example:
);
INSERT INTO example_map (id, properties) VALUES (uuid(), {'key1': 'value1', 'key2':
'value2'});
In these examples, `SET`, `LIST`, and `MAP` are used to store collections of tags,
comments,and properties, respectively. It's important to note that while these collection
types offer flexibility, their usage should align with your specific data modeling needs and
query patterns. Additionally, consider the impact on performance and scalability when
working with large collections.
Monitoring a Cassandra cluster involves using tools and techniques to assess its
health, performance, and other relevant metrics.
Displays the status of each node in the cluster, including their state (UN - Up, DN - Down),
load, and tokens.
Provides information about the cluster, including the Cassandra version, data center, and
Rack.
Displays the token ring information, showing the distribution of tokens across the cluster.
These are just a few examples of the numerous `nodetool` commands available. Running
`nodetool` without any arguments provides a list of available commands and their
descriptions. Always refer to the official documentation for the specific version of
Cassandra you are using for the most accurate and up-to-date information.
file_cache_size_in_mb: 512
disk_access_mode: mmap
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
concurrent_reads: 32
concurrent_writes: 32
native_transport_max_threads: 2048
endpoint_snitch: GossipingPropertyFileSnitch
read_request_timeout_in_ms: 5000
write_request_timeout_in_ms: 2000
read_consistency_level: ONE
write_consistency_level: LOCAL_QUORUM
# Additional Java Virtual Machine (JVM) options and garbage collection tuning
# Adjust based on your specific JVM version and garbage collection strategy
jvm_options:
- "-XX:+UseG1GC"
- "-XX:MaxGCPauseMillis=500"
Always refer to the official Cassandra documentation for version 3.11 for detailed
information on these settings.
COMPACTION STRATEGY
Compaction is the process of merging multiple SSTables (Sorted String Tables) into
a smaller number of SSTables, reducing storage space and improving read performance.
Cassandra provides different compaction strategies, each with its own advantages and use
cases. Here are some common compaction strategies in Cassandra:
• Description: Segments SSTables based on size and compacts smaller SSTables into
larger ones.
• Use Case: Suitable for write-intensive workloads with uniform data distribution.
compaction:
enabled: true
default_compaction_strategy:
SizeTieredCompactionStrategy
• Description: Divides SSTables into levels, each with a fixed size. Compacts SSTables
within the same level, then promotes them to the next level.
• Use Case: Suitable for read-heavy workloads, provides more predictable and
tunable compaction.
compaction:
enabled: true
default_compaction_strategy:
LeveledCompactionStrategy
• Description: Groups SSTables based on time intervals, compacts data within each
time window.
• Use Case: Suitable for time-series data where older data can be compacted
separately from newer data.
compaction:
enabled: true
default_compaction_strategy: TimeWindowCompactionStrategy
compaction:
enabled: true
default_compaction_strategy: DateTieredCompactionStrategy
compaction:
enabled: true
default_compaction_strategy:
SizeTieredCompactionStrategy compaction_strategy_options:
STCSIngestTTL: true
Choose the compaction strategy based on your specific use case, workload
characteristics, and performance requirements. Always monitor and test different
strategies in a controlled environment to determine the most effective one for your
Cassandra deployment.
1. Create Keyspace:
2. Alter Keyspace:
This alters the "ecommerce" keyspace to use the "SimpleStrategy" instead. This strategy
replicates data to a specified number of nodes (2 in this case) regardless of their location.
3. Create Table:
• product_id: This is a UUID (Universally Unique Identifier) and is the primary key of
the table.
• product_name: This is a text column that stores the product name.
• price: This is a double-precision floating-point column that stores the product price.
• stock_quantity: This is an integer column that stores the product's stock quantity.
This alters the "products" table to add a new column named "manufacturer" of type text.
This will allow you to store the manufacturer information for each product.
5. Create Index:
6. Insert Data:
This inserts a new product with the specified details into the "products" table. The uuid()
function generates a unique identifier for the product.
This retrieves all data from the "products" table, including the newly inserted product.
8. Update Stock:
This updates the stock quantity of the product with the specified product_id ('some_id') to
40. Remember to replace 'some_id' with the actual product ID you want to update.
9. Delete Product:
This deletes the product with the specified product_id ('some_id') from the "products"
table. Again, replace 'some_id' with the actual product ID you want to delete.
nodetool status
This command displays the status of the Cassandra nodes in your cluster, including
information like uptime, load, and token ownership.
nodetool compactionstats
This command shows information about ongoing compactions in your Cassandra cluster.
Compaction is a process that optimizes data storage by merging smaller data files into
larger ones.
This command takes a snapshot of the "ecommerce" keyspace. Snapshots are backups that
you can use to restore your data if needed.
This removes all data from the "products" table but keeps the table structure intact.
To drop the index “idx_product_name” and the table “products”, use the following
commands:
QUERY NO: 02
✓ Design a data model for a music streaming service. Consider entities such as users,
songs, playlists, and play history. Define tables and relationships to efficiently support
queries for user-specific playlists, recently played songs, and popular songs.
✓ Write a CQL query to find the top 5 most played songs in the last month. Consider
using appropriate aggregation functions and time-based filtering.
✓ Create a secondary index on a non-primary key column of the songs table. Write a
query to retrieve all songs released in a specific year using this secondary index.
✓ Implement a batch operation to update the order of songs in a user's playlist.
Demonstrate how batching can be utilized to ensure atomicity for multiple update
operations.
✓ Design a table to store real-time analytics data for user interactions (likes, shares,
skips) with songs. Write a query to retrieve songs with the highest engagement in the
last 24 hours.
✓ Identify a slow-performing query in your data model. Use appropriate techniques
(indexes, demoralization) to optimize the query's performance, and compare the
execution times before and after optimization.
✓ Simulate a scenario with a large data set of songs and users. Write a query to retrieve
songs that have not been played by a specific user, considering efficient handling of
large data.
✓ Create a query to retrieve songs based on a user's preferences, considering factors
like genre, tempo, and artist. Optimize the query for minimal response time.
Designing a data model for a music streaming service involves considering various
entities such as users, songs, playlists, play history, and real-time analytics. Here's a step-
by-step
Tables:
1. users (user_id uuid PRIMARY KEY, username text, email text, ...)
2. songs (song_id uuid PRIMARY KEY, title text, artist text, album text, release_year int,
genre text, ...)
3. playlists (user_id uuid, playlist_id uuid, name text, PRIMARY KEY (user_id,
playlist_id))
4. playlist_songs (playlist_id uuid, song_id uuid, position int, PRIMARY KEY
(playlist_id, song_id))
5. play_history (user_id uuid, song_id uuid, timestamp timestamp, PRIMARY KEY
(user_id, song_id))
6. user_preferences (user_id uuid, genre list<text>, tempo int, artist list<text>,
PRIMARY KEY (user_id))
Relationships:
This model allows for efficient queries on user-specific playlists, recently played songs, and
popular songs.
CQL Queries:
This query uses count(*) to count the number of plays for each song in the last month and
then retrieves the top 5 most played songs.
This creates a secondary index on the release_year column of the songs table.
This retrieves all songs released in 2023 using the idx_release_year index for faster
performance.
BEGIN BATCH
UPDATE playlist_songs SET position = 2 WHERE playlist_id = 'playlist_id' AND
song_id = 'song_id_1';
UPDATE playlist_songs SET position = 1 WHERE playlist_id = 'playlist_id' AND
song_id = 'song_id_2';
APPLY BATCH;
This batch operation updates the positions of two songs in a playlist atomically. This
ensures that both updates happen or neither happens, maintaining playlist consistency.
This query retrieves songs with the highest engagement (likes and shares) in the
last 24 hours.
• If querying user playlists is slow, consider adding a secondary index on the user_id
column in the playlist_songs table.
• If querying user preferences is slow, consider denormalizing by adding frequently
accessed user preferences to the play_history table.
Measure the execution time before and after optimization to quantify the improvement.
This query retrieves all songs by using a subquery to exclude songs already played by the
user. However, this might be inefficient for very large datasets.
SELECT s.*
FROM songs s
A Y Dadabhai Technical Institute, Kosamba Page 18
Introduction to NoSQL(4360704) ER NO:216010307044
True/False Statements:
2. Truncating a table removes the table structure along with its data.
4. The `nodetool status` command provides information about the cluster's data
consistency.
Ans:
ASSESSMENT RUBRICS
Needs
Criteria Excellent (10) Good (7) Satisfactory (5) Improvement Marks
(3)
Demonstrates Shows Has a basic Struggles with
Proficiency in mastery of basic proficiency understanding basic Cassandra
Basic operations and in basic of basic operations
Operations maintenance in operations operations and
Cassandra and maintenance
maintenance
Excels in
monitoring and Competently Demonstrates Unable to
Monitoring and effectively monitors basic skills in effectively
Troubleshooting troubleshoots and monitoring and monitor and
Cassandra troubleshoot troubleshooting troubleshoot
issues s Cassandra
Exhibits an in-
Understanding depth Demonstrate Shows a basic Lacks
of Cassandra understanding s a good understanding understanding
Architecture grasp of of Cassandra of Cassandra
of Cassandra
Cassandra architecture architecture
architecture
architecture
Effectively tunes Adequately Attempts to
Performance and optimizes tunes and tune and Struggles with
Tuning and Cassandra for optimizes optimize basic
Optimization optimal Cassandra Cassandra but performance
performance with minor with significant tuning
issues issues
Average Marks
-----------