0% found this document useful (0 votes)
16 views

Pratham SQL 5

The document provides an overview of keyspace, table, and collection management in Apache Cassandra including examples of creating, altering, and dropping keyspaces and tables. It also discusses Cassandra data modeling concepts like collections, CRUD operations, and batch processing.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Pratham SQL 5

The document provides an overview of keyspace, table, and collection management in Apache Cassandra including examples of creating, altering, and dropping keyspaces and tables. It also discusses Cassandra data modeling concepts like collections, CRUD operations, and batch processing.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to NoSQL(4360704) ER NO:216010307044

PRACTICAL NO: 05

Aim: Data Modeling and Simple Queries with Cassandra


Objective

To provide students with a comprehensive introduction to Apache Cassandra, emphasizing


its schema-agnostic data model, the role of keyspace, and basic concepts of the Cassandra
Query Language (CQL). Students will gain hands-on experience by setting up a Cassandra
cluster, configuring essential parameters, and executing basic CQL commands.

Prerequisite Theory

Basic operations and maintenance

In Cassandra, the management of keyspaces involves creating, altering, and dropping. Here
are examples of how you can perform these actions using the Cassandra Query Language
(CQL):

Create Keyspace:

To create a keyspace, use the `CREATE KEYSPACE` statement. Replace


'your_keyspace' with the desired keyspace name and set the replication strategy and factor
based on your requirements.

CREATE KEYSPACE your_keyspace

WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

Alter Keyspace:

To alter an existing keyspace, use the `ALTER KEYSPACE` statement. This can
include modifying the replication strategy, replication factor, or other configuration
options.

ALTER KEYSPACE your_keyspace

WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': 3, 'DC2': 2};

Drop Keyspace:

To drop (delete) a keyspace and all its data, use the `DROP KEYSPACE` statement. Be
cautious, as this operation is irreversible.

DROP KEYSPACE your_keyspace;

These statements provide the basic syntax for creating, altering, and dropping keyspaces in
Cassandra. Customize the keyspace names, replication strategies, and other parameters
based on your specific use case and requirements.

A Y Dadabhai Technical Institute, Kosamba Page 1


Introduction to NoSQL(4360704) ER NO:216010307044

In Cassandra, you can manage tables using the Cassandra Query Language (CQL).
Below are examples of creating, altering, and dropping tables:

Create Table:

To create a table, use the `CREATE TABLE` statement. Replace 'your_table' with the
desired table name and specify the columns along with their data types.

CREATE TABLE your_table

( id UUID PRIMARY KEY,

name TEXT, age INT, email TEXT );

Drop Table:

To drop (delete) a table, use the `DROP TABLE` statement. Be cautious, as this
operation permanently removes the table and its data.

DROP TABLE your_table;

These statements provide the basic syntax for creating, altering, and dropping tables in

Cassandra. Customize the table names, column definitions, and other parameters based on
your specific use case and requirements.

In Cassandra, the `TRUNCATE` statement is used to remove all data from a table while
keeping the table structure intact. This operation is similar to deleting all rows from the
table but more efficient, as it does not involve the same overhead.

Truncate Table:

To truncate a table, use the following `TRUNCATE` statement:

TRUNCATE your_table;

Replace 'your_table' with the name of the table you want to truncate. This statement
will remove all rows from the specified table, but the table structure, including column
definitions and indexes, will remain unchanged.

It's important to note that truncating a table is a non-reversible operation, and it


permanently removes all data from the table. Ensure you have a backup or are certain
about this action before executing it, especially in a production environment.

In Cassandra, you can create and drop indexes using the Cassandra Query Language
(CQL). Below are examples of creating and dropping an index on a table:

Create Index:

A Y Dadabhai Technical Institute, Kosamba Page 2


Introduction to NoSQL(4360704) ER NO:216010307044

To create an index on a column, use the `CREATE INDEX` statement. This allows you
to create an index to improve the performance of queries based on that column.

CREATE INDEX your_index_name ON your_table (your_column);

Replace 'your_index_name' with the desired name for the index, 'your_table' with
the table name, and 'your_column' with the column on which you want to create the index.

Drop Index:

To drop (delete) an index, use the `DROP INDEX` statement. This removes the specified
index from the table.

DROP INDEX your_index_name;

Replace 'your_index_name' with the name of the index you want to drop. This
statement removes the index from the table but does not affect the table structure or data.

Ensure that you carefully consider the implications of adding and removing indexes,
as it can impact the performance and storage requirements of your Cassandra database.

In Cassandra, the `BATCH` statement is used to group multiple CQL statements


(queries, updates, or deletes) into a single atomic operation. This ensures that either all the
statements in the batch are executed successfully or none of them are. Batches are useful
when you need to maintain consistency across multiple write operations.

Basic Batch Syntax:

BEGIN BATCH

// CQL statements to be executed

APPLY BATCH;

Here is a more detailed example:

BEGIN BATCH

INSERT INTO your_table (id, name, age) VALUES (uuid(), 'pratham', 18);

UPDATE your_table SET email = '[email protected]' WHERE id =


<some_id>; DELETE FROM another_table WHERE id = <another_id>;

APPLY BATCH;

- `BEGIN BATCH`: Indicates the beginning of the batch.

- `APPLY BATCH`: Indicates the end of the batch.

A Y Dadabhai Technical Institute, Kosamba Page 3


Introduction to NoSQL(4360704) ER NO:216010307044

CRUD OPERATIONS

In Cassandra, you can perform basic data manipulation operations such as insert, select,
update, and delete using the Cassandra Query Language (CQL). Here are examples of each
operation:

Insert:

To insert data into a table, use the `INSERT INTO` statement:

INSERT INTO your_table (id, name, age, email) VALUES (uuid(), 'mayur', 18,
'[email protected]');

Replace 'your_table' with the name of your table and adjust the values accordingly.

Select:

To retrieve data from a table, use the `SELECT` statement:

SELECT * FROM your_table WHERE id = <some_id>;

Replace 'your_table' with the table name and `<some_id>` with the specific identifier.

Update:

To update existing data in a table, use the `UPDATE` statement:

UPDATE your_table SET email = '[email protected]' WHERE id =


<some_id>;

Replace 'your_table' with the table name, `<some_id>` with the specific identifier, and
adjust the values accordingly.

Delete:

To delete data from a table, use the `DELETE` statement:

DELETE FROM your_table WHERE id = <some_id>;

Replace 'your_table' with the table name and `<some_id>` with the specific identifier.

Remember to replace placeholders such as 'your_table' and '<some_id>' with your


actual table name and identifier. Additionally, ensure that your data model and application
requirements guide the structure of these queries for optimal performance and scalability.

CASSANDRA COLLECTIONS

Cassandra collections are used to handle tasks. You can store multiple elements in
collection.

A Y Dadabhai Technical Institute, Kosamba Page 4


Introduction to NoSQL(4360704) ER NO:216010307044

In Cassandra, `SET`, `LIST`, and `MAP` are collection types that allow you to store multiple
values within a single column. These collections can be useful when you need to handle
scenarios where a column contains multiple items. Here's a brief overview of each:

SET:

A `SET` is an unordered collection of unique elements. Each element in the set must
be of the same data type. Duplicate values are not allowed.

Example:

CREATE TABLE example_set

( id UUID PRIMARY KEY,

tags SET<TEXT>

);

INSERT INTO example_set (id, tags) VALUES (uuid(), {'tag1', 'tag2', 'tag3'});

LIST:

A `LIST` is an ordered collection of elements where duplicates are allowed. Elements


in the list can be of different data types.

Example:

CREATE TABLE example_list

( id UUID PRIMARY KEY,

comments LIST<TEXT>

);

INSERT INTO example_list (id, comments) VALUES (uuid(), ['Comment 1', 'Comment
2', 'Comment 3']);

MAP:

A `MAP` is a collection of key-value pairs where each key is associated with a


specific value. Keys and values can have different data types.

Example:

CREATE TABLE example_map

( id UUID PRIMARY KEY,

properties MAP<TEXT, TEXT>

A Y Dadabhai Technical Institute, Kosamba Page 5


Introduction to NoSQL(4360704) ER NO:216010307044

);

INSERT INTO example_map (id, properties) VALUES (uuid(), {'key1': 'value1', 'key2':
'value2'});

In these examples, `SET`, `LIST`, and `MAP` are used to store collections of tags,
comments,and properties, respectively. It's important to note that while these collection
types offer flexibility, their usage should align with your specific data modeling needs and
query patterns. Additionally, consider the impact on performance and scalability when
working with large collections.

MONITORING AND TROUBLESHOOTING

Monitoring a Cassandra cluster involves using tools and techniques to assess its
health, performance, and other relevant metrics.

`nodetool` is a command-line utility in Apache Cassandra that provides various


operations and management tasks for interacting with and monitoring Cassandra nodes. It
is a powerful tool for system administrators and developers to perform actions on a
Cassandra cluster. Here are some commonly used `nodetool` commands:

(1) Status and Information:

- Node Status: nodetool status

Displays the status of each node in the cluster, including their state (UN - Up, DN - Down),
load, and tokens.

- Cluster Information: nodetool info

Provides information about the cluster, including the Cassandra version, data center, and
Rack.

(2) Performance and Metrics:

Compaction Stats: nodetool compactionstats

Displays information about ongoing compactions.

Thread Pool Stats: nodetool tpstats

Shows statistics for thread pools, helping identify performance bottlenecks.

Latency Information: nodetool cfstats

Displays column family statistics, including read and write latencies.

(3) Data Management:

A Y Dadabhai Technical Institute, Kosamba Page 6


Introduction to NoSQL(4360704) ER NO:216010307044

Table Snapshot: nodetool snapshot <keyspace> <table>

Takes a snapshot of a specific table in a keyspace. Useful for backup purposes.

Compact Tables: nodetool compact <keyspace> <table>

Forces compaction on a specific table to reclaim disk space.

- Flush Tables: nodetool flush <keyspace> <table>

Flushes data from memtables to disk for a specific table.

(4) Ring Management:

Token Ring: nodetool ring

Displays the token ring information, showing the distribution of tokens across the cluster.

Move Node: nodetool move <new_token>

Moves a node to a new token in the ring.

Decommission Node: nodetool decommission

Decommissions a node from the Cassandra cluster.

5. Repair and Maintenance:

Repair Node: nodetool repair

Initiates a repair operation to ensure data consistency.

Cleanup Node: nodetool cleanup

Performs cleanup by removing obsolete data on a node.

Garbage Collection (GC) Grace Period: nodetool


setcompactionthroughput <value>

Adjusts the compaction throughput, affecting the rate of garbage collection.

These are just a few examples of the numerous `nodetool` commands available. Running
`nodetool` without any arguments provides a list of available commands and their
descriptions. Always refer to the official documentation for the specific version of
Cassandra you are using for the most accurate and up-to-date information.

PERFORMANCE TUNING AND OPTIMIZATION

A Y Dadabhai Technical Institute, Kosamba Page 7


Introduction to NoSQL(4360704) ER NO:216010307044

Performance tuning and optimization in Apache Cassandra involve configuring and


adjusting various settings to ensure the cluster operates efficiently and meets the desired
performance goals.

(1) Memory Settings:

# Memory allocation settings for the Java Virtual Machine (JVM)

# Xms: Initial heap size

# -Xmx: Maximum heap size

# -Xmn: Young generation size

# Adjust values based on your system's RAM and workload

heap_size_options: "-Xms4G" "-Xmx4G" - "-Xmn800M"

(2) File Cache Size:

# Size of the file cache to utilize OS page cache

# Adjust based on available system memory

file_cache_size_in_mb: 512

(3) Disk Access Mode:

# Method for Cassandra to access data on disk

# Options: mmap, standard, or mmap_index_only

disk_access_mode: mmap

(4) Commit Log Settings:

# Commit log synchronization settings

# periodic: Periodically flushes the commit log to disk

# commitlog_sync_period_in_ms: Time interval for periodic commit log flush

commitlog_sync: periodic

commitlog_sync_period_in_ms: 10000

(5) Concurrent Reads and Writes:

# Number of concurrent read and write operations per node


A Y Dadabhai Technical Institute, Kosamba Page 8
Introduction to NoSQL(4360704) ER NO:216010307044

# Adjust based on workload and system capacity

concurrent_reads: 32

concurrent_writes: 32

(6) Native Transport Settings:

# Native transport (CQL) settings

# native_transport_max_threads: Maximum number of threads to process native


transport (CQL) requests

native_transport_max_threads: 2048

(7) Endpoint Snitch:

# Strategy for determining proximity and network topology

# Choose an appropriate snitch for your deployment

endpoint_snitch: GossipingPropertyFileSnitch

(8) Read and Write Timeouts:

# Timeout settings for read and write operations

# Adjust based on application requirements

read_request_timeout_in_ms: 5000

write_request_timeout_in_ms: 2000

(9) Consistency Levels:

# Consistency levels for read and write operations

# Adjust based on application requirements

# Use LOCAL_QUORUM for better performance in multi-data center setups

read_consistency_level: ONE

write_consistency_level: LOCAL_QUORUM

(10) Additional JVM and GC Tuning:

# Additional Java Virtual Machine (JVM) options and garbage collection tuning

A Y Dadabhai Technical Institute, Kosamba Page 9


Introduction to NoSQL(4360704) ER NO:216010307044

# Adjust based on your specific JVM version and garbage collection strategy

# Monitor GC logs to optimize settings

jvm_options:

- "-XX:+UseG1GC"

- "-XX:MaxGCPauseMillis=500"

Always refer to the official Cassandra documentation for version 3.11 for detailed
information on these settings.

COMPACTION STRATEGY

Compaction is the process of merging multiple SSTables (Sorted String Tables) into
a smaller number of SSTables, reducing storage space and improving read performance.
Cassandra provides different compaction strategies, each with its own advantages and use
cases. Here are some common compaction strategies in Cassandra:

(1) Size Tiered Compaction Strategy (STCS):

• Description: Segments SSTables based on size and compacts smaller SSTables into
larger ones.
• Use Case: Suitable for write-intensive workloads with uniform data distribution.

compaction:

enabled: true

default_compaction_strategy:

SizeTieredCompactionStrategy

(2) Levelled Compaction Strategy (LCS):

• Description: Divides SSTables into levels, each with a fixed size. Compacts SSTables
within the same level, then promotes them to the next level.
• Use Case: Suitable for read-heavy workloads, provides more predictable and
tunable compaction.

compaction:

enabled: true

default_compaction_strategy:

LeveledCompactionStrategy

A Y Dadabhai Technical Institute, Kosamba Page 10


Introduction to NoSQL(4360704) ER NO:216010307044

(3) Time Window Compaction Strategy (TWCS):

• Description: Groups SSTables based on time intervals, compacts data within each
time window.
• Use Case: Suitable for time-series data where older data can be compacted
separately from newer data.

compaction:

enabled: true

default_compaction_strategy: TimeWindowCompactionStrategy

(4) Date Tiered Compaction Strategy (DTCS):

• Description: Similar to TimeWindowCompactionStrategy but uses a more flexible


time window definition.
• Use Case: Suitable for time-series data with varying write rates.

compaction:

enabled: true

default_compaction_strategy: DateTieredCompactionStrategy

(5) Size Tiered Compaction Strategy with STCSI ngest TTL:

• Description: Extension of SizeTieredCompactionStrategy optimized for Time-To-


Live (TTL) data.
• Use Case: Suitable for write-intensive workloads with TTL-enabled data.

compaction:

enabled: true

default_compaction_strategy:

SizeTieredCompactionStrategy compaction_strategy_options:

STCSIngestTTL: true

Choose the compaction strategy based on your specific use case, workload
characteristics, and performance requirements. Always monitor and test different
strategies in a controlled environment to determine the most effective one for your
Cassandra deployment.

A Y Dadabhai Technical Institute, Kosamba Page 11


Introduction to NoSQL(4360704) ER NO:216010307044

Practical related Questions


QUERY 1 :

✓ Create a keyspace named "ecommerce" with a replication strategy of


'NetworkTopologyStrategy' and replication factor of 3, placing replicas in
'DC1' and 'DC2'.
✓ Alter the keyspace "ecommerce" to change the replication strategy to
'SimpleStrategy' with a replication factor of 2.
✓ Create a table named "products" with columns: product_id (UUID, primary
key), product_name (TEXT), price (DOUBLE), and stock_quantity (INT).
✓ Alter the table "products" to add a new column "manufacturer" of type TEXT.
✓ Create an index named "idx_product_name" on the "products" table for the
"product_name" column.
✓ Insert a new product into the "products" table with the following values:
(uuid(), 'Laptop', 1200.0, 50, 'Dell').
✓ Select all products from the "products" table.
✓ Update the stock_quantity of the product with product_id 'some_id' to 40.
✓ Delete the product with product_id 'some_id' from the "products" table.
✓ Use `nodetool status` to check the status of the Cassandra nodes in the
cluster.
✓ Use `nodetool compactionstats` to view information about ongoing
compactions.
✓ Take a snapshot of the "ecommerce" keyspace using `nodetool snapshot
ecommerce`.
✓ Truncate the "products" table.
✓ Drop the index "idx_product_name."
✓ Drop the table "products."
✓ Drop the keyspace "ecommerce."

A Y Dadabhai Technical Institute, Kosamba Page 12


Introduction to NoSQL(4360704) ER NO:216010307044

Step-by-step process for the Cassandra queries you requested:

1. Create Keyspace:

CREATE KEYSPACE ecommerce WITH replication =


{ 'class': 'NetworkTopologyStrategy', 'DC1': 3, 'DC2': 3 };

This creates a keyspace named "ecommerce" with a replication strategy called


"NetworkTopologyStrategy". This strategy ensures data is replicated across multiple data
centers ("DC1" and "DC2") with a replication factor of 3 in each data center. The replication
factor specifies how many copies of the data are stored.

2. Alter Keyspace:

ALTER KEYSPACE ecommerce WITH replication =


{ 'class': 'SimpleStrategy', 'replication_factor': 2 };

This alters the "ecommerce" keyspace to use the "SimpleStrategy" instead. This strategy
replicates data to a specified number of nodes (2 in this case) regardless of their location.

3. Create Table:

CREATE TABLE products (


product_id uuid PRIMARY KEY,
product_name text,
price double,
stock_quantity int
);

This creates a table named "products" with four columns:

• product_id: This is a UUID (Universally Unique Identifier) and is the primary key of
the table.
• product_name: This is a text column that stores the product name.
• price: This is a double-precision floating-point column that stores the product price.
• stock_quantity: This is an integer column that stores the product's stock quantity.

4. Alter Table (Add Column):

ALTER TABLE products ADD manufacturer text;

This alters the "products" table to add a new column named "manufacturer" of type text.
This will allow you to store the manufacturer information for each product.

A Y Dadabhai Technical Institute, Kosamba Page 13


Introduction to NoSQL(4360704) ER NO:216010307044

5. Create Index:

CREATE INDEX idx_product_name ON products(product_name);

This creates an index named "idx_product_name" on the "product_name" column of the


"products" table. Indexes improve query performance when searching by product name.

6. Insert Data:

INSERT INTO products (product_id, product_name, price, stock_quantity, manufacturer)


VALUES (uuid(), 'Laptop', 1200.0, 50, 'Dell');

This inserts a new product with the specified details into the "products" table. The uuid()
function generates a unique identifier for the product.

7. Select All Products:

SELECT * FROM products;

This retrieves all data from the "products" table, including the newly inserted product.

8. Update Stock:

UPDATE products SET stock_quantity = 40 WHERE product_id = 'some_id';

This updates the stock quantity of the product with the specified product_id ('some_id') to
40. Remember to replace 'some_id' with the actual product ID you want to update.

9. Delete Product:

DELETE FROM products WHERE product_id = 'some_id';

This deletes the product with the specified product_id ('some_id') from the "products"
table. Again, replace 'some_id' with the actual product ID you want to delete.

10. Check Node Status:

Open a terminal or command prompt and run the following command:

nodetool status

This command displays the status of the Cassandra nodes in your cluster, including
information like uptime, load, and token ownership.

A Y Dadabhai Technical Institute, Kosamba Page 14


Introduction to NoSQL(4360704) ER NO:216010307044

11. View Compaction Stats:

Open a terminal or command prompt and run the following command:

nodetool compactionstats

This command shows information about ongoing compactions in your Cassandra cluster.
Compaction is a process that optimizes data storage by merging smaller data files into
larger ones.

12. Take a Snapshot:

Open a terminal or command prompt and run the following command:

nodetool snapshot ecommerce

This command takes a snapshot of the "ecommerce" keyspace. Snapshots are backups that
you can use to restore your data if needed.

13. Truncate Table:

TRUNCATE TABLE products;

This removes all data from the "products" table but keeps the table structure intact.

14 Drop the Index and Table:

To drop the index “idx_product_name” and the table “products”, use the following
commands:

DROP INDEX IF EXISTS ecommerce.idx_product_name;

DROP TABLE IF EXISTS ecommerce.products;

Remember to replace placeholders like ‘some_id’ with actual values as needed. If


you have any further ques

A Y Dadabhai Technical Institute, Kosamba Page 15


Introduction to NoSQL(4360704) ER NO:216010307044

QUERY NO: 02

✓ Design a data model for a music streaming service. Consider entities such as users,
songs, playlists, and play history. Define tables and relationships to efficiently support
queries for user-specific playlists, recently played songs, and popular songs.
✓ Write a CQL query to find the top 5 most played songs in the last month. Consider
using appropriate aggregation functions and time-based filtering.
✓ Create a secondary index on a non-primary key column of the songs table. Write a
query to retrieve all songs released in a specific year using this secondary index.
✓ Implement a batch operation to update the order of songs in a user's playlist.
Demonstrate how batching can be utilized to ensure atomicity for multiple update
operations.
✓ Design a table to store real-time analytics data for user interactions (likes, shares,
skips) with songs. Write a query to retrieve songs with the highest engagement in the
last 24 hours.
✓ Identify a slow-performing query in your data model. Use appropriate techniques
(indexes, demoralization) to optimize the query's performance, and compare the
execution times before and after optimization.
✓ Simulate a scenario with a large data set of songs and users. Write a query to retrieve
songs that have not been played by a specific user, considering efficient handling of
large data.
✓ Create a query to retrieve songs based on a user's preferences, considering factors
like genre, tempo, and artist. Optimize the query for minimal response time.

Designing a data model for a music streaming service involves considering various
entities such as users, songs, playlists, play history, and real-time analytics. Here's a step-
by-step

Here's a data model for your music streaming service:

Tables:

1. users (user_id uuid PRIMARY KEY, username text, email text, ...)
2. songs (song_id uuid PRIMARY KEY, title text, artist text, album text, release_year int,
genre text, ...)
3. playlists (user_id uuid, playlist_id uuid, name text, PRIMARY KEY (user_id,
playlist_id))
4. playlist_songs (playlist_id uuid, song_id uuid, position int, PRIMARY KEY
(playlist_id, song_id))
5. play_history (user_id uuid, song_id uuid, timestamp timestamp, PRIMARY KEY
(user_id, song_id))
6. user_preferences (user_id uuid, genre list<text>, tempo int, artist list<text>,
PRIMARY KEY (user_id))

A Y Dadabhai Technical Institute, Kosamba Page 16


Introduction to NoSQL(4360704) ER NO:216010307044

Relationships:

• A user can have many playlists (users -> playlists).


• A playlist can contain many songs (playlists -> playlist_songs).
• A user can have many play history entries (users -> play_history).
• A user can have preferences for genres, tempo, and artists (users ->
user_preferences).

This model allows for efficient queries on user-specific playlists, recently played songs, and
popular songs.

CQL Queries:

1. Top 5 Most Played Songs (Last Month):

SELECT song_id, count(*) AS play_count


FROM play_history
WHERE timestamp >= dateSub(now(), interval 1 month)
GROUP BY song_id
ORDER BY play_count DESC
LIMIT 5;

This query uses count(*) to count the number of plays for each song in the last month and
then retrieves the top 5 most played songs.

2. Secondary Index on Release Year:

CREATE INDEX idx_release_year ON songs(release_year);

This creates a secondary index on the release_year column of the songs table.

SELECT * FROM songs


WHERE release_year = 2023
USE INDEX idx_release_year;

This retrieves all songs released in 2023 using the idx_release_year index for faster
performance.

3. Batch Update Playlist Order:

BEGIN BATCH
UPDATE playlist_songs SET position = 2 WHERE playlist_id = 'playlist_id' AND
song_id = 'song_id_1';
UPDATE playlist_songs SET position = 1 WHERE playlist_id = 'playlist_id' AND
song_id = 'song_id_2';
APPLY BATCH;

A Y Dadabhai Technical Institute, Kosamba Page 17


Introduction to NoSQL(4360704) ER NO:216010307044

This batch operation updates the positions of two songs in a playlist atomically. This
ensures that both updates happen or neither happens, maintaining playlist consistency.

4. Real-time Analytics (Engagement):

CREATE TABLE user_engagement (


user_id uuid,
song_id uuid,
action text (like, share, skip),
timestamp timestamp,
PRIMARY KEY (user_id, song_id, timestamp)
);

This table stores user interactions with songs.

SELECT song_id, count(*) AS total_actions


FROM user_engagement
WHERE action IN ('like', 'share')
AND timestamp >= dateSub(now(), interval 1 day)
GROUP BY song_id
ORDER BY total_actions DESC
LIMIT 10;

This query retrieves songs with the highest engagement (likes and shares) in the
last 24 hours.

5. Slow Query Optimization:

• If querying user playlists is slow, consider adding a secondary index on the user_id
column in the playlist_songs table.
• If querying user preferences is slow, consider denormalizing by adding frequently
accessed user preferences to the play_history table.

Measure the execution time before and after optimization to quantify the improvement.

6. Songs Not Played by a User:

SELECT * FROM songs


WHERE song_id NOT IN (
SELECT song_id FROM play_history WHERE user_id = 'user_id'
);

This query retrieves all songs by using a subquery to exclude songs already played by the
user. However, this might be inefficient for very large datasets.

7. Songs Based on User Preferences:

SELECT s.*
FROM songs s
A Y Dadabhai Technical Institute, Kosamba Page 18
Introduction to NoSQL(4360704) ER NO:216010307044

INNER JOIN user_preferences up ON s.genre IN up.genre


WHERE up.user_id = 'user_id' AND (s..)

True/False Statements:

1. Creating an index improves the performance of queries.

2. Truncating a table removes the table structure along with its data.

3. ALTER KEYSPACE is used to modify replication strategies.

4. The `nodetool status` command provides information about the cluster's data
consistency.

Ans:

A Y Dadabhai Technical Institute, Kosamba Page 19


Introduction to NoSQL(4360704) ER NO:216010307044

ASSESSMENT RUBRICS

Needs
Criteria Excellent (10) Good (7) Satisfactory (5) Improvement Marks
(3)
Demonstrates Shows Has a basic Struggles with
Proficiency in mastery of basic proficiency understanding basic Cassandra
Basic operations and in basic of basic operations
Operations maintenance in operations operations and
Cassandra and maintenance
maintenance
Excels in
monitoring and Competently Demonstrates Unable to
Monitoring and effectively monitors basic skills in effectively
Troubleshooting troubleshoots and monitoring and monitor and
Cassandra troubleshoot troubleshooting troubleshoot
issues s Cassandra
Exhibits an in-
Understanding depth Demonstrate Shows a basic Lacks
of Cassandra understanding s a good understanding understanding
Architecture grasp of of Cassandra of Cassandra
of Cassandra
Cassandra architecture architecture
architecture
architecture
Effectively tunes Adequately Attempts to
Performance and optimizes tunes and tune and Struggles with
Tuning and Cassandra for optimizes optimize basic
Optimization optimal Cassandra Cassandra but performance
performance with minor with significant tuning
issues issues
Average Marks

-----------

Signature with date

A Y Dadabhai Technical Institute, Kosamba Page 20

You might also like