0% found this document useful (0 votes)
17 views

Teradata Vantage SQL Basics

Uploaded by

regardsjustin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Teradata Vantage SQL Basics

Uploaded by

regardsjustin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Notes

One of the most important differences between joins and subqueries is the need for
establishing a one-to-many relationship in a join, something that is automatically provided
when writing a subquery.
This example illustrates what happens when the join relationship is many-to-many. In such
a case, unintended result rows appear in the final result set. In the example, employee 1015
works only in department 501, but since the join is on the manager number, this result set
will show that person working in both department 501 and department 402. Certainly not
the real circumstance. Other employees (employees 1018, 1023, 1006, 1008) share the same
fate. It would be difficult, if not impossible, to view the result set and know who truly works
in which department.

NOT IN vs. Inner Join


● Remember that every IN-Subquery can be rewritten as an Inner Join based on
equality
● But NOT IN is a condition based on non-equality

Notes
There are multiple ways to rewrite NOT IN, the best choice to find rows in table A where
matching data doesn't exist in table B is a NOT EXISTS Correlated Subquery: (Covered in a
later module)
Cross Join
A CROSS JOIN is a join where no join condition is specified. Since no qualification exists, the
database establishes a pseudo-condition of “WHERE 1=1”, which is true for each and every
comparison. An example is illustrated below demonstrating Cross Join.

Notes
CROSS JOIN is a rarely used syntax. In our example, the one on the left is preferred
because it shows the reader that a cross join is intended, while the one on the right may or
may not be (perhaps the writer forgot the join condition).
Since no join condition exists, the database invents one for us, whether we are pleased
with it or not. Explain shows a condition of “1=1”, which always evaluates true. Thus, you
can read the row for employee “Short” as “Project the employee’s number, last name, and
department name for each row in the department table where 1=1 is true.” The result is to
project these column values (from the “Smith” row) for each department row. The same
thing happens all over again for each employee row.

4) We do an all-AMPs RETRIEVE step in TD_MAP1 from EMPLOYEE_SALES.d by


way of an all-rows scan with no residual conditions into Spool 2 (
all_amps), which is duplicated on all AMPs in TD_Map1. The size of
Spool 2 is estimated with high confidence to be 18 rows (270 bytes).
The estimated time for this step is 0.00 seconds.
5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by way
of an all-rows scan, which is joined to EMPLOYEE_SALES.e by way of an
all-rows scan with no residual conditions. Spool 2 and
EMPLOYEE_SALES.e are joined using a product join, with a join
condition of ("(1=1)"). The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 234 rows (15,678 bytes). The estimated
time for this step is 0.01 seconds.

Cartesian Products
A completely unconstrained cross join is called a Cartesian product. It results when a CROSS JOIN is

issued without a WHERE clause. In this case, each row of one table is joined to each row of another

table. The output of a Cartesian product is often not meaningful however they do have useful

application. Any query requiring all elements of one table to be compared to all elements of another is

a candidate for a Cartesian product using the cross join syntax.


Notes
The number of rows in the result set is the number of rows in the left table times the
number of rows in the right table, Explain shows a “Product Join”. An 8,000,000-row table
and a 50,000 row table would yield a 400,000,000,000 row answer set.
Things to Notice:
Cartesian products often result accidentally from an inner join with improper aliasing or
missing join conditions. Explaining the Select can easily reveal this because the optimizer
will do this product join very likely as the final step with a very high estimated number of
rows and runtime.
Real-World Uses for Cartesian Product Joins
One important use is to generate a very large answer set. This can allow you to benchmark
system performance with large data throughputs. Cross joins are also used to create all
possible combinations of values in two tables/columns for further processing, e.g.,
month/product, even if a product was not sold in a specific month
Another use is for an airline to compare all possible combinations of cities in order to
evaluate: flight plan information, passenger rate structures, and mileage awards. For this
you would use a cross join.

Mistakes on Table Aliasing


Be careful! Do not alias a table and then use the table name instead of the alias.
A few examples are illustrated below. In the examples, only the first one will fail due to a syntax error.

All others will cause a three-table join: An Inner Join between dept and emp followed by a Cross Join

to employee!

So, Why a Cross Join?


In order to understand this, you need the execute the following query:

Notes

A table-alias is not really an alias, it replaces the tablename within that query. When using

aliases in writing joins, one must be careful to always use alias names when referencing,

and not use aliased table names. In our examples, the table Employee has been aliased as

“emp” in the FROM, but the join condition references Employee as a table name and does

not reference the alias. In the first example SQL-92 join syntax requires a join condition

from the joined tables and thus the query fails with "3782: Improper column reference in

the search condition of a joined table". But the SQL-89 syntax is interpreted as having three

tables being joined, namely: “Emp,” “Dept,” and “Department.”


To understand what is happening one must realize that the FROM clause is not required in

a Teradata SQL request (but in Standard SQL). Teradata was implemented before there was

Standard SQL, the initial query language was called TEQUEL (TEradata QUEry Language),

whose syntax didn't require to list tables within FROM.

RETRIEVE employee.last_name contains enough information for the Parser to resolve table

& column name and returns

last_name

----------------------------------------

Hopkins

Ratzlaff

Rogers

Rogers

Kanieski

Crane

Stein

Johnson

Short

Brown

...

Planning n-way Joins


If more than two relations are specified in a join, the Optimizer reduces that join to a series of binary

joins and then attempts to determine the most cost effective join order. Because the Optimizer uses

column statistics to choose the least costly join plan from the candidate set it generates, the plans it

generates are extremely dependent on the accuracy of those numbers.

Column projection and row selection are done prior to doing the join.
n-way joins are reduced to a series of binary joins.

Query optimizers use trees to build and analyze optimal join orders, most common are:

● Left-deep Search Tree:

○ Number of join orders = n!

● Bushy Search Tree:

○ Number of join orders = (2n-2)!/(n-1)

Notes

Join Order Search Trees

Query optimizers use trees to build and analyze optimal join orders. The join

search tree types used most frequently by relational database optimizers are the

left-deep tree and the bushy tree.

When a left-deep search tree is used to analyze possible join orders, the number

of possibilities produced is relatively small.

Bushy trees are an optimal method for generating more join order combinations.

At the same time, the number of combinations generated can be prohibitive, so


the Optimizer uses several heuristics to prune the search space. This method

produces an order of magnitude more join possibilities that can be evaluated

than the left-deep tree method. Bushy trees also provide the capability of

performing some joins in parallel.

The Optimizer uses various combinations of join plan search trees, sometimes

mixing left-deep, bushy, and even right-deep branches within the same tree.

Possible Join Orders as a Function of the Number of Relations To Be Joined

The possibilities for ordering binary joins escalate rapidly as the total number of

relations joined increases. The Optimizer is very intelligent about looking for

plans and uses numerous field-proven heuristics to ensure that more costly

plans are eliminated from consideration early in the costing process in order to

optimize the join plan search space.

Summary
● Column values may be projected from any table of a join

● Subqueries and inner joins can both return inner result sets

● Inner joins have both an implicit form and an explicit form

● Inner joins typically involve one-to-many relationships based on equality

● A table may be joined to itself

● Inner joins cannot return outer (NOT IN) result sets as can subqueries

● Incorrect table and column references can cause incorrect result sets

💻 Labs: SQL Inner Joins


Try completing the labs in the AWS lab environment. When you get the desired results, compare your

code with the code under the Solution tabs.

LAB 1
Select all currently active (legacy_flag = 0) jobs with a job_code in the

33x.xxx range with assigned (= exist in hr_payroll) active HELP TABLE br_payroll;

(hire_end_date IS NULL) employees. HELP TABLE br_jobs;

Order the result set by job_code.

SOLUTION
br_jobs.job_code
is unique, but one row per matching employee is returned, resulting in duplicate rows DI

(A subquery is doing this automatically)


LAB 2

Select all active (hire_end_date IS NULL) employees with an


annual_salary less than 80,000 whose job_title contains
HELP TABLE br_payroll;
administrator.
HELP TABLE br_jobs;
Concatenate first_name and last_name and alias it to
fullname.
Order the result set by birthday within year.

SOLUTION
No DISTINCT needed as br_payroll.job_code is non-unique.
LAB 3

Write a query to return those active


employees (active = hire_end_date is
null) with a higher annual_salary than
the manager of their department.
HELP TABLE br_payroll;
Show name and salary for both HELP TABLE br_departments;
employee and manager and calculate
their salary difference.

Order the result set by descending


difference.
SOLUTION
br_payroll is used twice, similar to a self-join.

LAB 4
Select all districts with less than
60,000 inhabitants
(num_inhabitants) having
HELP TABLE fin_district;
accounts with a loan at status
HELP TABLE fin_account;
'D' (= running contract, client in
HELP TABLE fin_loan;
debt).

Order the result set by


district_name.

SOLUTION
Again, DISTINCT needed to remove duplicate rows.

You might also like