SQL Notes para Profesionales
SQL Notes para Profesionales
262 48 0 1
Officer
239 Benefits Specialist 45 0 1
252 Buyer 50 0 0.111111111111111
251 Buyer 49 0.125 0.333333333333333
256 Buyer 49 0.125 0.333333333333333
253 Buyer 48 0.375 0.555555555555555
254 Buyer 48 0.375 0.555555555555555
The PERCENT_RANKfunction ranks the entries within each group. For each entry, it returns the percentage of entries
in the same group that have lower values.
The CUME_DISTfunction is similar, except that it returns the percentage of values less than or equal to the current
value.
Table items
id name tag
1 example unique_tag
2 foo simple
42 bar simple
3 baz hello
51 quux world
I'd like to get all those lines and know if a tag is used by other lines
SELECT id, name, tag, COUNT(*) OVER (PARTITION BY tag) > 1 AS flag FROM items
In case your database doesn't have OVER and PARTITION you can use this to produce the same result:
SELECT id, name, tag, (SELECT COUNT(tag) FROM items B WHERE tag = A.tag) > 1 AS flag FROM items A
The LAG() analytical function helps to solve the problem by returning for each row the value in the preceding row:
In case your database doesn't have LAG() you can use this to produce the same result:
date amount
2016-03-12 200
2016-03-11 -50
2016-03-14 100
2016-03-15 100
2016-03-10 -250
SELECT date, amount, SUM(amount) OVER (ORDER BY date ASC) AS running
FROM operations
ORDER BY date ASC
Instead of using two queries to get a count then the line, you can use an aggregate as a window function and use
the full result set as the window.
User_ID Completion_Date
1 2016-07-20
1 2016-07-21
2 2016-07-20
2 2016-07-21
2 2016-07-22
;with CTE as
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY User_ID
ORDER BY Completion_Date DESC) Row_Num
FROM Data)
SELECT * FORM CTE WHERE Row_Num <= n
Using n=1, you'll get the one most recent row user_id
per :
--Give a table name `Numbers" and a column `i` to hold the numbers
WITH Numbers(i) AS (
--Starting number/index
SELECT 1
--Top-level UNION ALL operator required for recursion
UNION ALL
--Iteration expression:
SELECT i + 1
--Table expression we first declared used as source for recursion
FROM Numbers
--Clause to define the end of the recursion
WHERE i < 5
)
--Use the generated table expression like a regular table
SELECT i FROM Numbers;
i
1
2
3
4
5
This method can be used with any number interval, as well as other types of data.
UNION ALL
-- get employees that have any of the previously selected rows as manager
SELECT ManagedByJames.Level + 1,
Employees.ID,
Employees.FName,
Employees.LName
FROM Employees
JOIN ManagedByJames
ON Employees.ManagerID = ManagedByJames.ID
WITH ReadyCars AS (
SELECT *
FROM Cars
WHERE Status = 'READY'
)
SELECT ID, Model, TotalCost
FROM ReadyCars
ORDER BY TotalCost;
ID Model TotalCost
1 Ford F-150 200
2 Ford F-150 230
UNION ALL
-- Transition Sequence = Rest & Relax into Day Shift into Night Shift
-- RR (Rest & Relax) = 1
-- DS (Day Shift) = 2
-- NS (Night Shift) = 3
;WITH roster AS
(
SELECT @DateFrom AS RosterStart, 1 AS TeamA, 2 AS TeamB, 3 AS TeamC
UNION ALL
SELECT DATEADD(d, @IntervalDays, RosterStart),
CASE TeamA WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 1 END AS TeamA,
CASE TeamB WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 1 END AS TeamB,
CASE TeamC WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 1 END AS TeamC
FROM roster WHERE RosterStart < DATEADD(d, -@IntervalDays, @DateTo)
)
SELECT RosterStart,
ISNULL(LEAD(RosterStart) OVER (ORDER BY RosterStart), RosterStart + @IntervalDays) AS
RosterEnd,
CASE TeamA WHEN 1 THEN 'RR' WHEN 2 THEN 'DS' WHEN 3 THEN 'NS' END AS TeamA,
CASE TeamB WHEN 1 THEN 'RR' WHEN 2 THEN 'DS' WHEN 3 THEN 'NS' END AS TeamB,
CASE TeamC WHEN 1 THEN 'RR' WHEN 2 THEN 'DS' WHEN 3 THEN 'NS' END AS TeamC
FROM roster
Result
I.e. For Week 1 TeamA is on R&R, TeamB is on Day Shift and TeamC is on Night Shift.
WITH tbl AS (
SELECT id, name, parent_id
FROM mytable)
, tbl_hierarchy AS (
/* Anchor */
Clauses
CONNECT BY: Specifies the relationship that defines the hierarchy.
START WITH: Specifies the root nodes.
ORDER SIBLINGS BY: Orders results properly.
Parameters
NOCYCLE: Stops processing a branch when a loop is detected. Valid hierarchies are Directed Acyclic
Graphs, and circular references violate this construct.
Operators
PRIOR: Obtains data from the node's parent.
CONNECT_BY_ROOT: Obtains data from the node's root.
Pseudocolumns
LEVEL: Indicates the node's distance from its root.
CONNECT_BY_ISLEAF: Indicates a node without children.
CONNECT_BY_ISCYCLE: Indicates a node with a circular reference.
Functions
SYS_CONNECT_BY_PATH: Returns a flattened/concatenated representation of the path to the node
from its root.
SELECT *
FROM dept_income;
DepartmentName TotalSalary
HR 1900
Sales 600
number
--------
1
(1 row)
number
--------
1
(1 row)
number
--------
1
2
(2 rows)
SELECT *
FROM Employees -- this is a comment
WHERE FName = 'John'
/* This query
returns all employees */
SELECT *
FROM Employees
An example of where a foreign key is required is: In a university, a course must belong to a department. Code for
the this scenario is:
The following table will contain the information of the subjects offered by the Computer science branch:
(The data type of the Foreign Key must match the datatype of the referenced key.)
A Foreign Key must reference a UNIQUE (or PRIMARY) key in the parent table.
Entering a NULL value in a Foreign Key column does not raise an error.
Foreign Key constraints can reference tables within the same database.
Foreign Key constraints can refer to another column in the same table (self-reference).
We will add a new table in order to store the powers of each super hero:
UPDATE Orders
SET Order_UID = orders_seq.NEXTVAL
WHERE Customer = 581;
SELECT *
FROM Employees
WHERE Salary = (SELECT MAX(Salary) FROM Employees)
SELECT EmployeeId
FROM Employee AS eOuter
WHERE Salary > (
SELECT AVG(Salary)
FROM Employee eInner
WHERE eInner.DepartmentId = eOuter.DepartmentId
)
SELECT *
SELECT *
FROM Employees AS e
LEFT JOIN Supervisors AS s ON s.EmployeeID=e.EmployeeID
WHERE s.EmployeeID is NULL
The above finds cities from the weather table whose daily temperature variation is greater than 20. The result is:
city temp_var
ST LOUIS 21
LOS ANGELES 31
LOS ANGELES 23
LOS ANGELES 31
LOS ANGELES 27
LOS ANGELES 28
LOS ANGELES 28
LOS ANGELES 32
Here: the subquery (SELECT avg(pop2000) FROM cities) is used to specify conditions in the WHERE clause. The re
is:
name pop2000
San Francisco 776733
ST LOUIS 348189
Kansas City 146866
-- Or
EXEC Northwind.getEmployee @LastName = N'Ackerman', @FirstName = N'Pilar';
GO
-- Or
EXECUTE Northwind.getEmployee @FirstName = N'Pilar', @LastName = N'Ackerman';
GO
AS
BEGIN
-- insert audit record to MyAudit table
INSERT INTO MyAudit(MyTableId, User)
(SELECT MyTableId, CURRENT_USER FROM inserted)
END
BEGIN TRY
BEGIN TRANSACTION
INSERT INTO Users(ID, Name, Age)
VALUES(1, 'Bob', 24)
A database table should not be considered as just another table; it has to follow a set of rules to be considered tru
relational. Academically it is referred to as a 'relation' to make the distinction.
1. Each value is atomic; the value in each field in each row must be a single value.
2. Each field contains values that are of the same data type.
3. Each field heading has a unique name.
4. Each row in the table must have at least one value that makes it unique amongst the other records in the
table.
5. The order of the rows and columns has no significance.
Such a query allows users to rapidly find database tables containing columns of interest, such as when attemptin
to relate data from 2 tables indirectly through a third table, without existing knowledge of which tables may conta
keys or other useful columns in common with the target tables.
Using T-SQL for this example, a database's information schema may be searched as follows:
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%Institution%'
The result contains a list of matching columns, their tables' names, and other useful information.
VT stands for 'Virtual Table' and shows how various data is produced as the query is processed
1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as
a result, virtual table VT1 is generated.
2. ON: The ON filter is applied to VT1. Only rows for which the is TRUE are inserted to VT2.
3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the
preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows,
generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeated
between the result of the last join and the next table in the FROM clause until all tables are processed.
4. WHERE: The WHERE filter is applied to VT3. Only rows for which the is TRUE are inserted to VT4.
5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY
clause. VT5 is generated.
6. CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
7. HAVING: The HAVING filter is applied to VT6. Only groups for which the is TRUE are inserted to VT7.
10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A
cursor is generated (VC10).
11. TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is
generated and returned to the caller. LIMIT has the same functionality as TOP in some SQL dialects such as
Postgres and Netezza.
Names should describe what is stored in their object. This implies that column names usually should be singular.
Whether table names should use singular or plural is a heavily discussed question, but in practice, it is more
common to use plural table names.
Keywords
SQL keywords are not case sensitive. However, it is common practice to write them in upper case.
At the minimum, put every clause into a new line, and split lines if they would become too long otherwise:
SELECT d.Name,
COUNT(*) AS Employees
FROM Departments AS d
JOIN Employees AS e ON d.ID = e.DepartmentID
WHERE d.Name != 'HR'
HAVING COUNT(*) > 10
ORDER BY COUNT(*) DESC;
Sometimes, everything after the SQL keyword introducing a clause is indented to the same column:
SELECT d.Name,
COUNT(*) AS Employees
FROM Departments AS d
JOIN Employees AS e ON d.ID = e.DepartmentID
WHERE d.Name != 'HR'
HAVING COUNT(*) > 10
(This can also be done while aligning the SQL keywords right.)
SELECT
d.Name,
COUNT(*) AS Employees
FROM
Departments AS d
JOIN
Employees AS e
ON d.ID = e.DepartmentID
WHERE
d.Name != 'HR'
HAVING
COUNT(*) > 10
ORDER BY
COUNT(*) DESC;
SELECT Model,
EmployeeID
FROM Cars
WHERE CustomerID = 42
AND Status = 'READY';
Using multiple lines makes it harder to embed SQL commands into other programming languages. However, man
languages have a mechanism for multi-line strings, e.g.,
@"..." in C#,"""...""" in Python, orR"(...)" in C++.
When usingSELECT * , the data returned by a query can change whenever the table definition changes. This
increases the risk that different versions of your application or your database are incompatible with each other.
Furthermore, reading more columns than necessary can increase the amount of disk and network I/O.
So you should always explicitly specify the column(s) you actually want to retrieve:
--SELECT * don't
SELECT ID, FName, LName, PhoneNumber -- do
FROM Emplopees;
However,SELECT * does not hurt in the subquery of an EXISTS operator, because EXISTS ignores the actual data
anyway (it checks only if at least one row has been found). For the same reason, it is not meaningful to list any
specific column(s) for EXISTS, SELECT
so * actually makes more sense:
The join condition is somewhere in the WHERE clause, mixed up with any other filter conditions. This makes
it harder to see which tables are joined, and how.
Due to the above, there is a higher risk of mistakes, and it is more likely that they are found later.
In standard SQL, explicit joins are the only way to use outer joins:
SELECT d.Name,
e.Fname || e.LName AS EmpName
FROM Departments AS d
LEFT JOIN Employees AS e ON d.ID = e.DepartmentID;
SELECT RecipeID,
Recipes.Name,
COUNT(*) AS NumberOfIngredients
FROM Recipes
LEFT JOIN Ingredients USING (RecipeID);
(This requires that both tables use the same column name.
USING automatically removes the duplicate column from the result, e.g., the join in this query returns a
singleRecipeID column.)