MariaDB Temporal Tables

MariaDB Temporal Tables
Federico Razzoli

€ whoami
● Federico Razzoli
● Freelance consultant
● Writing SQL since MySQL 2.23
● info@federico-razzoli.com
● I love open source, sharing,
Collaboration, win-win, etc
● I love MariaDB, MySQL, Postgres, etc
○ Even Db2, somehow

Data versioning… why?
Several reasons:
● Auditing
● Travel back in time
○ Which / how many products were we selling in Dec 2016?
● Track a row’s history
○ History of the relationship with a customer
● Compare today’s situation with 6 month ago
○ How many EU employees did we lose because of Brexit?
● Statistics on data changes
○ Sales trends
● Find correlations
○ Sales decrease because we invest less in web marketing

Example
SELECT * FROM users WHERE id = 24 G
*************************** 1. row
id: 24
first_name: Jody
last_name: Whittaker
email: first_lady@doctorwho.co.uk
gender: F
birth_date: NULL
1 row in set (0.00 sec)

Method 1: track row versions
SELECT * FROM user_changes G
*************************** 1. row
id: 1
first_name: Jody
last_name: Whittaker
email: first_lady@doctorwho.co.uk
gender: F
valid_from: 2018-10-07
valid_to: NULL

Method 1: track row versions
What we can do (easily):
● Undo a column change
● Undo an UPDATE/DELETE
● Get the full state of a row at a given time
● See how often a row changes
Harder to do:
● Audit changes
● See how often a value changed over time

Method 2: track field changes
SELECT * FROM user_changes G
*************************** 1. row
id: 1
user_id: 24
field: email
old_value: jody@gmail.com
new_value: first_lady@doctorwho.co.uk
valid_from: 2018-10-07
valid_to: NULL

Method 2: track field changes
What we can do (easily):
● Undo a column change
● Audit changes
● See how a certain value changed over time
Harder to do:
● Undo an UPDATE/DELETE
● Get/restore an old row version
● See how often a row changes over time

System-Versioned Tables
They automagically implement the Keep Row Changes method
● You INSERT, DELETE, UPDATE and SELECT data, getting the same
results you would get with a regular table
● Old versions of the rows are stored in the same (logical) table
● To get old data, you need to use a special syntaxes, like:
SELECT … AS OF TIMESTAMP '2018/01/01 16:30:00';

Implementations

Where are sysver tables implemented?
In the proprietary DBMS world:
● Oracle 11g (2007)
● Db2 (2012)
● SQL Server 2016
Sometimes they are called Temporal Tables
In Db2, a temporal table can use system-period or application-period

In the open source world:
● PostgreSQL, as an extension
● CockroachDB
● MariaDB 10.3
PostgreSQL and CockroachDB implementations have important
limitations

In the NoSQL world:
● In HBase, rows have a version property

In MariaDB

Overview
● Implemented in MariaDB 10.3 (stable since Apr 2017)
● You must have row-start and row-end Generated Columns
○ Type: TIMESTAMP(6) or DATETIME(6)
○ You decide the names
○ These are Invisible Columns (10.3 feature)
● Any storage engine
○ Except CONNECT (MDEV-15968)

ALTER TABLE
● Forbidden by default
○ Changes that only affect metadata also forbidden
○ system_versioning_alter_history='KEEP’
● ADD COLUMN adds a column that is set to the current DEFAULT value
or NULL for all old version rows
○ When was the column added?
● DROP COLUMN also affects old versions of the rows
○ The column’s history is lost
● CHANGE COLUMN also affects old versions of the rows
○ The column’s history is modified

Example
CREATE OR REPLACE TABLE employee (
...
valid_from TIMESTAMP(6)
GENERATED ALWAYS AS ROW START
COMMENT ‘When the row was INSERTed',
valid_to TIMESTAMP(6)
GENERATED ALWAYS AS ROW END
COMMENT 'When row was DELETEd or UPDATEd',
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
)
WITH SYSTEM VERSIONING,
ENGINE InnoDB;

Adding versioning to an existing table
ALTER TABLE employee
LOCK = SHARED,
ALGORITHM = COPY,
ADD COLUMN valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
ADD COLUMN valid_to TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
ADD PERIOD FOR SYSTEM_TIME(valid_from, valid_to),
ADD SYSTEM VERSIONING
;
● Notice ALGORITHM=COPY and LOCK=SHARED

Querying historical data: point in time
SELECT * FROM my_table FOR SYSTEM_TIME
● AS OF TIMESTAMP'2018-10-01 12:00:00'
● FROM '2018-10-01 00:00:00' TO '2018-11-01 00:00:00'
● BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW() ALL
SELECT * FROM my_table FOR SYSTEM_TIME ALL
WHERE valid_from < @end AND valid_to > @start;

Querying historical data: point in time
● AS OF TIMESTAMP'2018-10-01 12:00:00'
● AS OF (SELECT valid_from FROM employee WHERE id=50)
● AS OF @some_event_timestamp

Querying historical data: time range
● FROM '2018-10-01 00:00:00' TO '2018-11-01 00:00:00'
● FROM (SELECT ...) TO (SELECT ...)
● FROM @some_event TO @another_event

Querying historical data: time range
● BETWEEN '2018-10-01 00:00:00' AND '2018-11-01 00:00:00'
● BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW()
● BETWEEN @some_event AND (SELECT ...)

Querying historical data: example
INSERT INTO customer (id, first_name, last_name, email) VALUES
(1, 'William', 'Hartnell', 'the_first@gmail.com'),
(2, 'Tom', 'Baker', 'tom.baker@gmail.com');
SET @beginning_of_time := NOW(6);
DELETE FROM customer WHERE id = 1;
UPDATE customer SET email = 'tom.baker@hotmail.com' WHERE id=2;
(3, 'Peter', 'Capaldi', 'capaldi.petey@gmail.com');
SET @twelve_regeneration := NOW(6);
(4, 'Jody', 'Wittaker', 'jody@gmail.com');

Querying historical data: example
SELECT id, first_name, last_name, valid_from, valid_to
FROM customer
FOR SYSTEM_TIME
BETWEEN @beginning_of_time AND @twelve_regeneration;
+----+------------+-----------+----------------------------+----------------------------+
| id | first_name | last_name | valid_from | valid_to |
+----+------------+-----------+----------------------------+----------------------------+
| 1 | William | Hartnell | 2018-11-04 13:50:33.627753 | 2018-11-04 13:50:33.633414 |
| 2 | Tom | Baker | 2018-11-04 13:50:33.627753 | 2018-11-04 13:50:33.638419 |
| 2 | Tom | Baker | 2018-11-04 13:50:33.638419 | 2038-01-19 03:14:07.999999 |
| 3 | Peter | Capaldi | 2018-11-04 13:50:33.644880 | 2038-01-19 03:14:07.999999 |
+----+------------+-----------+----------------------------+----------------------------+

Partitions
● By default, the history is stored together with current data
● You can put the history on separate partitions
● And limit them:
○ By rows number
○ By time

Indexes
● The ROW END column is appended to UNIQUE indexes and the PK
● Other indexes are untouched
○ You may consider adding ROW END to some of your indexes
○ This is a good reason to define temporal columns explicitly

Basic Examples
● First examples take advantage of temporal columns to identify INSERTs,
DELETEs and UPDATEs

Get DELETEd rows
● Get canceled orders:
SET @eot := '2038-01-19 03:14:07.999999';
SELECT * FROM `order` FOR SYSTEM_TIME ALL
WHERE valid_to < @eot;
● Get orders canceled today:
SELECT COUNT(*)
FROM `order` FOR SYSTEM_TIME
WHERE valid_to = @
AND valid_from
DATE(NOW()) AND (DATE(NOW()) + INTERVAL 1 DAY);

Get INSERTed rows
● Get orders generated today:
SELECT id, MIN(valid_from) AS insert_time
FROM `order` FOR SYSTEM_TIME ALL
GROUP BY id
HAVING DATE(MIN(valid_from)) = DATE(NOW())
ORDER BY MIN(valid_from);

Get UPDATEd rows
● How many times orders were modified:
SELECT
-- exclude the INSERTions
id, (COUNT(*) – 1) AS how_many_edits
-- exclude DELETions
WHERE valid_to < @eot
GROUP BY id
ORDER BY how_many_edits;

Debug mistakes in a single row
● Wrong data has been found in a row. The original version of the row was
correct, so we want to know when the mistake (or malicious change)
happened

● Find all versions of a row:
SELECT id, status, valid_from, valid_to
WHERE id = 24;

● When an order was blocked:
SELECT DATE(MIN(valid_from)) AS block_date
WHERE status = 'BLOCKED'
AND id = 24;

● Status is wrong. To find out how the problem happened, we want to check the
INSERT and all following status changes:
SELECT id, status, valid_from, valid_to
FROM (
SELECT
NOT (status <=> @prev_status) AS status_changed,
@prev_status := status,
id, status, valid_from, valid_to
WHERE id = 24
) t
WHERE status_changed = 1;

● We want to know if the status is the same as one month ago:
SELECT
present.id,
present.status AS current_status,
past.status AS past_status
FROM `order` present
INNER JOIN `order`
FOR SYSTEM_TIME AS OF TIMESTAMP
NOW() - INTERVAL 1 MONTH
AS past
ON present.id = past.id
ORDER BY present.id;

Stats on data changes
SELECT
AVG(amount), STDDEV(amount),
MAX(amount), MIN(amount),
COUNT(amount)
FROM account FOR SYSTEM_TIME ALL
WHERE customer_id = 24
AND valid_from BETWEEN '2016-00-00' AND NOW()
GROUP BY customer_id;

MariaDB Temporal Tables

Recommended

More Related Content

What's hot (20)

Similar to MariaDB Temporal Tables (20)

More from Federico Razzoli (20)

Recently uploaded (20)

MariaDB Temporal Tables