0% found this document useful (0 votes)
11 views29 pages

Become a super modeler

The document is a presentation by Patrick McFadin on advanced data modeling techniques, particularly focusing on Cassandra. It covers various topics including time series data modeling, user models, and the use of collections in Cassandra, as well as indexing strategies for efficient data retrieval. The presentation emphasizes the importance of effective data modeling for successful deployments and encourages the audience to enhance their skills in this area.

Uploaded by

thomasperez1276
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views29 pages

Become a super modeler

The document is a presentation by Patrick McFadin on advanced data modeling techniques, particularly focusing on Cassandra. It covers various topics including time series data modeling, user models, and the use of collections in Cassandra, as well as indexing strategies for efficient data retrieval. The presentation emphasizes the importance of effective data modeling for successful deployments and encourages the audience to enhance their skills in this area.

Uploaded by

thomasperez1276
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Become a super modeler

Patrick McFadin @PatrickMcFadin


Senior Solutions Architect
DataStax

Thursday, May 16, 13


Become a super modeler
Patrick McFadin @PatrickMcFadin
Senior Solutions Architect
DataStax

Thursday, May 16, 13


... the saga continues.

This is the second part of a data modeling series

Part 1: The data model is dead, long live the data model!

• Relational -> Cassandra topics


• Basic entity modeling
• one-to-many
• many-to-many
• Transaction like modeling

Thursday, May 16, 13


Becoming a super modeler

• Data model is the key to happiness


• Successful deployments depend on it
• Not just a Cassandra problem...

Thursday, May 16, 13


Time series - Basic

• Weather station collects regular temperature


• Each weather station is a row
• Each event is a new column in a wide row

CREATE TABLE temperature (


weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);

Thursday, May 16, 13


Time series - Super!

• Every second? Row would be too big


• Order by access pattern
• Partition the rows by day
- One weather station by day
CREATE TABLE temperature_by_day (
weatherstation_id text, Compound row key
date text,
event_time timestamp,
temperature text,
PRIMARY KEY ((weatherstation_id,date),event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);

Reverse sort: Last event, first on row

Thursday, May 16, 13


User model - basic

• Plain ole entity table


• One primary key
• Booooring
CREATE TABLE users (
username text PRIMARY KEY,
first_name text,
last_name text,
address1 text,
city text,
postal_code text,
last_login timestamp
);

Thursday, May 16, 13


Cassandra feature - Collections

• Collections give you three types:


- Set
- List
- Map
• Each allow for dynamic updates
• Fully supported in CQL 3
• Requires serialization so don’t go crazy
CREATE TABLE collections_example (
! id int PRIMARY KEY,
! set_example set<text>,
! list_example list<text>,
! map_example map<int,text>
);

Thursday, May 16, 13


Cassandra Collections - Set

• Set is sorted by CQL type comparator

set_example set<text>

Collection name Collection type CQL Type

INSERT INTO collections_example (id, set_example)


VALUES(1, {'1-one', '2-two'});

Thursday, May 16, 13


Cassandra Collections - Set Operations

• Adding an element to the set


UPDATE collections_example
SET set_example = set_example + {'3-three'} WHERE id = 1;

• After adding this element, it will sort to the beginning.


UPDATE collections_example
SET set_example = set_example + {'0-zero'} WHERE id = 1;

• Removing an element from the set


UPDATE collections_example
SET set_example = set_example - {'3-three'} WHERE id = 1;

Thursday, May 16, 13


Cassandra Collections - List

• Ordered by insertion

list_example list<text>

Collection name Collection type CQL Type

INSERT INTO collections_example (id, list_example)


VALUES(1, ['1-one', '2-two']);

10

Thursday, May 16, 13


Cassandra Collections - List Operations

• Adding an element to the end of a list


UPDATE collections_example
SET list_example = list_example + ['3-three'] WHERE id = 1;

• Adding an element to the beginning of a list


UPDATE collections_example
SET list_example = ['0-zero'] + list_example WHERE id = 1;

• Deleting an element from a list


UPDATE collections_example
SET list_example = list_example - ['3-three'] WHERE id = 1;

11

Thursday, May 16, 13


Cassandra Collections - Map

• Key and value


• Key is sorted by CQL type comparator

map_example map<int,text>

Collection name Collection type Key CQL Type Value CQL Type

INSERT INTO collections_example (id, map_example)


VALUES(1, { 1 : 'one', 2 : 'two' });

12

Thursday, May 16, 13


Cassandra Collections - Map Operations

• Add an element to the map


UPDATE collections_example
SET map_example[3] = 'three' WHERE id = 1;

• Update an existing element in the map


UPDATE collections_example
SET map_example[3] = 'tres' WHERE id = 1;

• Delete an element in the map


DELETE map_example[3]
FROM collections_example WHERE id = 1;

13

Thursday, May 16, 13


User model - Super!

• Take boring user table and kick it up


• Great for static + some dynamic
• Takes advantage of row level isolation

CREATE TABLE user_with_location (


! username text PRIMARY KEY,
! first_name text,
! last_name text,
! address1 text,
! city text,
! postal_code text,
! last_login timestamp,
! location_by_date map<timeuuid,text>
);

14

Thursday, May 16, 13


Super user profile - Operations

• Adding new login locations to the map


UPDATE user_with_location
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';

• Adding new login locations to the map + TTL!


UPDATE user_with_location
USING TTL 2592000 // 30 Days
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';

15

Thursday, May 16, 13


Indexing

• Indexing expresses application intent


• Fast access to specific queries
• Secondary indexes != relational indexes
• Use information you have. No pre-reads.

Goals:
1. Create row key for speed
2. Use wide rows for efficiency

16

Thursday, May 16, 13


Keyword index

• Use a word as a key


• Columns are the occurrence
• Ex: Index of tag words about videos

CREATE TABLE tag_index (


tag varchar,
videoid uuid,
timestamp timestamp,
PRIMARY KEY (tag, videoid)
Fast );

tag VideoId1 .. VideoIdN Efficient

17

Thursday, May 16, 13


Partial word index

• Where row size will be large


• Take one part for key, rest for columns name
CREATE TABLE email_index (
domain varchar,
user varchar,
username varchar,
PRIMARY KEY (domain, user)
);

User: tcodd Email: [email protected]

INSERT INTO email_index (domain, user, username)


VALUES ('@relational.com','tcodd', 'tcodd');

18

Thursday, May 16, 13


Partial word index - Super!

• Create partitions + partial indexes FTW


CREATE TABLE product_index (
store int,
part_number0_3 int, Compound row key!
part_number4_9 int,
count int,
PRIMARY KEY ((store,part_number0_3), part_number4_9)
);

• Store #8675309 has 3 of part# 7079748575


INSERT INTO product_index (store,part_number0_3,part_number4_9,count)
VALUES (8675309,7079,48575,3);

SELECT count
FROM product_index
WHERE store = 8675309
Fast and efficient!
AND part_number0_3 = 7079
AND part_number4_9 = 48575;

19

Thursday, May 16, 13


Bit map index

• Multiple parts to a key


• Create a truth table of the different combinations
• Inserts == the number of combinations
- 3 fields? 7 options (Not going to use null choice)
- 4 fields? 15 options

20

Thursday, May 16, 13


Bit map index

• Find a car in a lot by variable combinations

Make Model Color Combination


x Color
x Model
x x Model+Color
x Make
x x Make+Color
x x Make+Model
x x x Make+Model+Color

21

Thursday, May 16, 13


Bit map index - Table create

• Make a table with three different key combos

CREATE TABLE car_location_index (


make varchar,
model varchar,
color varchar,
vehical_id int,
lot_id int,
PRIMARY KEY ((make,model,color),vehical_id)
);

Compound row key with three different options

22

Thursday, May 16, 13


Bit map index - Adding records

• Pre-optimize for 7 possible questions on insert

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('Ford','Mustang','Blue',1234,8675309);

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('Ford','Mustang','',1234,8675309);

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('Ford','','Blue',1234,8675309);

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('Ford','','',1234,8675309);

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('','Mustang','Blue',1234,8675309);

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('','Mustang','',1234,8675309);

INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)


VALUES ('','','Blue',1234,8675309);

23

Thursday, May 16, 13


Bit map index - Selecting records

• Different combinations now possible

SELECT vehical_id,lot_id vehical_id | lot_id


FROM car_location_index ------------+---------
WHERE make = 'Ford' 1234 | 8675309
AND model = ''
AND color = 'Blue';

SELECT vehical_id,lot_id vehical_id | lot_id


FROM car_location_index ------------+---------
WHERE make = '' 1234 | 8675309
AND model = '' 8765 | 5551212
AND color = 'Blue';

24

Thursday, May 16, 13


Feeling super yet?

• Use these skills. Save you they will.


• Don’t settle for boring data models
• Stay tuned for more!

• Final will be at the Cassandra Summit: June 11th

The worlds next top data model

25

Thursday, May 16, 13


Be there!!!

Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.

Here is my discount code! Use it: PMcVIP

26

Thursday, May 16, 13


Bonus!

• DataStax Java Driver Preso - June 12th


• Download today!

https://github.com/datastax/java-driver

27

Thursday, May 16, 13


Thank You

Q&A

Thursday, May 16, 13

You might also like