0% found this document useful (0 votes)
41 views

SQL For Data Analysis Cheat Sheet Letter

Uploaded by

Sameer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

SQL For Data Analysis Cheat Sheet Letter

Uploaded by

Sameer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SQL for Data Analysis Cheat Sheet

SQL GROUP BY ORDER BY JOIN


SQL, or Structured Query Language, is a language for talking PRODUCT Fetch product names sorted by the price column in the JOIN is used to fetch data from multiple tables. To get the
to databases. It lets you select specific data and build default ASCending order: names of products purchased in each order, use:
name category
complex reports. Today, SQL is a universal language of data, SELECT name SELECT
used in practically all technologies that process data. Knife Kitchen FROM product orders.order_date,
Pot Kitchen category count ORDER BY price [ASC]; product.name AS product,
amount
Mixer Kitchen Kitchen 3
SELECT Jeans Clothing Clothing 3
FROM orders
JOIN product
Fetch the id and name columns from the product table: Fetch product names sorted by the price column in
Sneakers Clothing Electronics 2 DESCending order: ON product.id = orders.product_id;
SELECT id, name
FROM product; Leggings Clothing SELECT name
FROM product
Smart TV Electronics ORDER BY price DESC;
Learn more about JOINs in our interactive SQL JOINs
Concatenate the name and the description to fetch the full Laptop Electronics
course.
description of the products:
SELECT name || ' - ' || description
FROM product;
COMPUTATIONS
Use +, -, *, / to do basic math. To get the number of seconds
AGGREGATE FUNCTIONS
Count the number of products:
in a week: INSERT
Fetch names of products with prices above 15: SELECT 60 * 60 * 24 * 7; To insert data into a table, use the INSERT command:
SELECT COUNT(*) -- result: 604800
SELECT name INSERT INTO category
FROM product;
FROM product VALUES
WHERE price > 15; (1, 'Home and Kitchen'),
ROUNDING NUMBERS (2, 'Clothing and Apparel');
Count the number of products with non-null prices: Round a number to its nearest integer:
Fetch names of products with prices between 50 and 150: SELECT COUNT(price)
SELECT name SELECT ROUND(1234.56789);
FROM product; -- result: 1235 You may specify the columns to which the data is added. The
FROM product
WHERE price BETWEEN 50 AND 150; remaining columns are filled with predefined default values
or NULLs.
Count the number of unique category values: Round a number to two decimal places: INSERT INTO category (name)
Fetch names of products that are not watches: SELECT COUNT(DISTINCT category) SELECT ROUND(AVG(price), 2) VALUES ('Electronics');
SELECT name FROM product; FROM product
FROM product WHERE category_id = 21;
WHERE name != 'watch'; -- result: 124.56
UPDATE
Get the lowest and the highest product price: To update the data in a table, use the UPDATE command:
Fetch names of products that start with a 'P' or end with an SELECT MIN(price), MAX(price) UPDATE category
's': FROM product; TROUBLESHOOTING SET
SELECT name is_active = true,
FROM product INTEGER DIVISION name = 'Office'
WHERE name LIKE 'P%' OR name LIKE '%s'; In PostgreSQL and SQL Server, the / operator performs
WHERE name = 'Ofice';
Find the total price of products for each category: integer division for integer arguments. If you do not see the
SELECT category, SUM(price) number of decimal places you expect, it is because you are
Fetch names of products that start with any letter followed by FROM product dividing between two integers. Cast one to decimal:
'rain' (like 'train' or 'grain'):
SELECT name
GROUP BY category; 123 / 2 -- result: 61
CAST(123 AS decimal) / 2 -- result: 61.5
DELETE
FROM product To delete data from a table, use the DELETE command:
WHERE name LIKE '_rain'; DELETE FROM category
Find the average price of products for each category whose WHERE name IS NULL;
average is above 3.0: DIVISION BY 0
Fetch names of products with non-null prices: SELECT category, AVG(price) To avoid this error, make sure the denominator is not 0. You
SELECT name FROM product may use the NULLIF() function to replace 0 with a NULL,
Check out our interactive course How to INSERT, UPDATE,
FROM product GROUP BY category which results in a NULL for the entire expression:
and DELETE Data in SQL.
WHERE price IS NOT NULL; HAVING AVG(price) > 3.0; count / NULLIF(count_all, 0)

LearnSQL.com is owned by Vertabelo SA


Learn the basics of SQL in our interactive SQL Basics course. vertabelo.com | CC BY-NC-ND Vertabelo SA
SQL for Data Analysis Cheat Sheet
DATE AND TIME SORTING CHRONOLOGICALLY INTERVALS EXTRACTING PARTS OF DATES
There are 3 main time-related types: date, time, and Using ORDER BY on date and time columns sorts rows An interval measures the difference between two points in The standard SQL syntax to get a part of a date is
timestamp. Time is expressed using a 24-hour clock, and it chronologically from the oldest to the most recent: time. For example, the interval between 2023-07-04 and SELECT EXTRACT(YEAR FROM order_date)
can be as vague as just hour and minutes (e.g., 15:30 – 3:30 SELECT order_date, product, quantity 2023-07-06 is 2 days. FROM sales;
p.m.) or as precise as microseconds and time zone (as shown FROM sales
below): ORDER BY order_date;
To define an interval in SQL, use this syntax:
2021-12-31 14:39:53.662522-05 INTERVAL '1' DAY You may extract the following fields:
order_date product quantity YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND.
date time
timestamp 2023-07-22 Laptop 2 The syntax consists of three elements: the INTERVAL
YYYY-mm-dd HH:MM:SS.ssssss±TZ 2023-07-23 Mouse 3 keyword, a quoted value, and a time part keyword. You may The standard syntax does not work In SQL Server.
2023-07-24 Sneakers 10 use the following time parts: YEAR, MONTH, DAY, HOUR, Use the DATEPART(part, date) function instead.
14:39:53.662522-05 is almost 2:40 p.m. CDT (e.g., in MINUTE, and SECOND. SELECT DATEPART(YEAR, order_date)
Chicago; in UTC it'd be 7:40 p.m.). The letters in the above 2023-07-24 Jeans 3
FROM sales;
example represent: 2023-07-25 Mixer 2
In the date part: In the time part: Adding intervals to date and time values
YYYY – the 4-digit HH – the zero-padded hour in a You may use + or - to add or subtract an interval to date or
year. 24-hour clock.
Use the DESCending order to sort from the most recent to the
timestamp values. GROUPING BY YEAR AND MONTH
oldest: Find the count of sales by month:
mm – the zero- MM – the minutes. SELECT order_date, product, quantity
padded month (01 SS – the seconds. Omissible. SELECT
FROM sales Subtract one year from 2023-07-05: EXTRACT(YEAR FROM order_date) AS year,
—January through ssssss – the smaller parts of a ORDER BY order_date DESC; SELECT CAST('2023-07-05' AS TIMESTAMP) EXTRACT(MONTH FROM order_date) AS month,
12—December). second – they can be expressed
- INTERVAL '1' year; COUNT(*) AS count
dd – the zero- using 1 to 6 digits. Omissible.
-- result: 2022-07-05 00:00:00
padded day. ±TZ – the timezone. It must COMPARING DATE AND TIME FROM sales
GROUP BY
start with either + or -, and use
two digits relative to UTC. VALUES Find customers who placed the first order within a month year,
month
Omissible. You may use the comparison operators <, <=, >, >=, and = to from the registration date:
SELECT id ORDER BY
compare date and time values. Earlier dates are less than
FROM customers year
CURRENT DATE AND TIME later ones. For example, 2023-07-05 is "less" than 2023-
month;
08-05. WHERE first_order_date >
Find out what time it is:
registration_date + INTERVAL '1' month;
SELECT CURRENT_TIME;
Find sales made in July 2023:
Get today's date: SELECT order_date, product_name, quantity year month count
Filtering events to those in the last 7 days
SELECT CURRENT_DATE; FROM sales 2022 8 51
To find the deliveries scheduled for the last 7 days, use:
In SQL Server: WHERE order_date >= '2023-07-01' 2022 9 58
SELECT delivery_date, address
SELECT GETDATE(); AND order_date < '2023-08-01';
FROM sales 2022 10 62
WHERE delivery_date <= CURRENT_DATE
Get the timestamp with the current date and time: Find customers who registered in July 2023: 2022 11 76
AND delivery_date >= CURRENT_DATE
SELECT CURRENT_TIMESTAMP; SELECT registration_timestamp, email - INTERVAL '7' DAY; 2022 12 85
FROM customer
2023 1 71
CREATING DATE AND TIME VALUES WHERE registration_timestamp >= '2023-07-
To create a date, time, or timestamp, write the value as a 01' Note: In SQL Server, intervals are not implemented – use the 2023 2 69
string and cast it to the proper type. AND registration_timestamp < '2023-08- DATEADD() and DATEDIFF() functions.
SELECT CAST('2021-12-31' AS date); 01';
SELECT CAST('15:31' AS time); Note that you must group by both the year and the month.
SELECT CAST('2021-12-31 23:59:29+02' Note: Pay attention to the end date in the query. The upper
Filtering events to those in the last 7 days
EXTRACT(MONTH FROM order_date) only extracts the
AS timestamp); bound '2023-08-01' is not included in the range. The in SQL Server month number (1, 2, ..., 12). To distinguish between months
SELECT CAST('15:31.124769' AS time); timestamp '2023-08-01' is actually the timestamp To find the sales made within the last 7 days, use: from different years, you must also group by year.
'2023-08-01 00:00:00.0'. The comparison operator SELECT delivery_date, address
Be careful with the last example – it is interpreted as 15 < is used to ensure the selection is made for all timestamps FROM sales
minutes 31 seconds and 124769 microseconds! It is always a less than '2023-08-01 00:00:00.0', that is, all WHERE delivery_date <= GETDATE()
More about working with date and time values in our
good idea to write 00 for hours explicitly: timestamps in July 2023, even those close to the midnight of AND delivery_date >=
interactive Standard SQL Functions course.
'00:15:31.124769'. August 1, 2023. DATEADD(DAY, -7, GETDATE());

LearnSQL.com is owned by Vertabelo SA


LearnSQL.com – Practical SQL Courses for Teams and Individuals vertabelo.com | CC BY-NC-ND Vertabelo SA
SQL for Data Analysis Cheat Sheet
CASE WHEN GROUP BY EXTENSIONS COALESCE RANKING
CASE WHEN lets you pass conditions (as in the WHERE GROUPING SETS COALESCE replaces the first NULL argument with a given Rank products by price:
clause), evaluates them in order, then returns the value for value. It is often used to display labels with GROUP BY SELECT RANK() OVER(ORDER BY price), name
GROUPING SETS lets you specify multiple sets of columns
the first condition met. extensions. FROM product;
to group by in one query.
SELECT region, RANKING FUNCTIONS
SELECT region, product, COUNT(order_id)
COALESCE(product, 'All'), RANK – gives the same rank for tied values, leaves gaps.
SELECT FROM sales
COUNT(order_id) DENSE_RANK – gives the same rank for tied values without
name, GROUP BY
FROM sales gaps.
CASE GROUPING SETS ((region, product), ());
GROUP BY ROLLUP (region, product); ROW_NUMBER – gives consecutive numbers without gaps.
WHEN price > 150 THEN 'Premium'
WHEN price > 100 THEN 'Mid-range' region product count region product count
name rank dense_rank row_number
ELSE 'Standard' USA Laptop 10 USA Laptop 10
Jeans 1 1 1
END AS price_category GROUP BY (region,
USA Mouse 5 USA Mouse 5
FROM product; product) Leggings 2 2 2
UK Laptop 6 USA All 15
Leggings 2 2 3
NULL NULL 21 GROUP BY () – all rows UK Laptop 6
Here, all products with prices above 150 get the Premium Sneakers 4 3 4
label, those with prices above 100 (and below 150) get the UK All 6
Sneakers 4 3 5
Mid-range label, and the rest receives the Standard label. CUBE All All 21
Sneakers 4 3 6
CUBE generates groupings for all possible subsets of the
CASE WHEN and GROUP BY GROUP BY columns. COMMON TABLE EXPRESSIONS T-Shirt 7 4 7
SELECT region, product, COUNT(order_id) A common table expression (CTE) is a named temporary
You may combine CASE WHEN and GROUP BY to compute
FROM sales result set that can be referenced within a larger query. They RUNNING TOTAL
object statistics in the categories you define. A running total is the cumulative sum of a given value and all
GROUP BY CUBE (region, product); are especially useful for complex aggregations and for
SELECT preceding values in a column.
breaking down large queries into more manageable parts.
CASE SELECT date, amount,
region product count WITH total_product_sales AS (
WHEN price > 150 THEN 'Premium' SUM(amount) OVER(ORDER BY date)
SELECT product, SUM(profit) AS
WHEN price > 100 THEN 'Mid-range' USA Laptop 10 AS running_total
GROUP BY region, total_profit
ELSE 'Standard' USA Mouse 5 FROM sales;
product FROM sales
END AS price_category,
UK Laptop 6 GROUP BY product MOVING AVERAGE
COUNT(*) AS products
)
FROM product USA NULL 15 A moving average (a.k.a. rolling average, running average) is a
GROUP BY region technique for analyzing trends in time series data. It is the
GROUP BY price_category; UK NULL 6 SELECT AVG(total_profit)
average of the current value and a specified number of
NULL Laptop 16 FROM total_product_sales;
GROUP BY product preceding values.
Count the number of large orders for each customer using NULL Mouse 5 SELECT date, price,
Check out our hands-on courses on Common Table
CASE WHEN and SUM(): AVG(price) OVER(
NULL NULL 21 GROUP BY () – all rows Expressions and GROUP BY Extensions.
SELECT ORDER BY date
customer_id,
SUM( ROLLUP
WINDOW FUNCTIONS ROWS BETWEEN 2 PRECEDING
Window functions compute their results based on a sliding AND CURRENT ROW
CASE WHEN quantity > 10 ROLLUP adds new levels of grouping for subtotals and grand ) AS moving_averge
THEN 1 ELSE 0 END window frame, a set of rows related to the current row. Unlike
totals. FROM stock_prices;
) AS large_orders aggregate functions, window functions do not collapse rows.
SELECT region, product, COUNT(order_id)
FROM sales FROM sales
COMPUTING THE PERCENT OF TOTAL WITHIN A GROUP DIFFERENCE BETWEEN TWO ROWS
GROUP BY customer_id; SELECT product, brand, profit,
GROUP BY ROLLUP (region, product);
(100.0 * profit / (DELTA)
SUM(profit) OVER(PARTITION BY brand) SELECT year, revenue,
... or using CASE WHEN and COUNT(): region product count LAG(revenue) OVER(ORDER BY year)
) AS perc
SELECT USA Laptop 10 FROM sales; AS revenue_prev_year,
customer_id, GROUP BY region, revenue -
USA Mouse 5 product brand profit perc
COUNT( product LAG(revenue) OVER(ORDER BY year)
CASE WHEN quantity > 10 UK Laptop 6 Knife Culina 1000 25 AS yoy_difference
THEN order_id END USA NULL 15 Pot Culina 3000 75 FROM yearly_metrics;
) AS large_orders GROUP BY region
UK NULL 6 Doll Toyze 2000 40 Learn about SQL window functions in our interactive
FROM sales
NULL NULL 21 GROUP BY () – all rows Car Toyze 3000 60 Window Functions course.
GROUP BY customer_id;

LearnSQL.com is owned by Vertabelo SA


Learn more in our interactive Creating Basic SQL Reports course vertabelo.com | CC BY-NC-ND Vertabelo SA

You might also like