0% found this document useful (0 votes)
0 views

SQL - Final

The document provides an overview of SQL for analytics, covering topics such as database management, SQL command categories, data types, and table operations. It explains the importance of primary and foreign keys, as well as how to create, insert, update, and delete records in a database. Additionally, it introduces concepts like joins, sorting, and constraints, offering practical examples and exercises for better understanding of SQL functionalities.

Uploaded by

Vivek Kamble
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

SQL - Final

The document provides an overview of SQL for analytics, covering topics such as database management, SQL command categories, data types, and table operations. It explains the importance of primary and foreign keys, as well as how to create, insert, update, and delete records in a database. Additionally, it introduces concepts like joins, sorting, and constraints, offering practical examples and exercises for better understanding of SQL functionalities.

Uploaded by

Vivek Kamble
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

SQL

for Analytics

SATYAJIT PATTNAIK
SQL Timeline

Introduction Statements & Sorting


Queries Results

Joins

Table Basics Advanced


& Selecting Data Queries
SQL Timeline Contd..

Agg Functions Nested Queries


Having clause
& Views

Normalization

Ordering Regex
Database

Organised collection of Data stored in an electronic format


Database Management System

A System Software for creating and managing database


Database Architecture

Database
Architecture

File Server Client Server


Database Architecture

File Server Client Server

✔ Client request a file from the disk or a ✔ Desktop software requests data from the
file-server database
✔ Software opens the files, stores it in ✔ Data is held in memory but when a change
memory, makes changes in memory, and is made, a request (SQL statement) is
then saves it back to the disk drive (or file generated.
server) ✔ When you save the data these requests
✔ What happens if someone else opens the are passed to the database in the order
same file after you do, then you save your they were generated.
changes, and then they save their ✔ So, multiple people can work in the same
changes? table at the same time with no worries
○ Potential data loss ○ Risk of data loss is low
○ Solution is file-locking ○ Still possible for someone else to
○ File locking is very inconvenient change data after you change it.
however as it means that nobody ○ Versioning is an option to record
else can edit the file if someone the changes (DB Admins are
else has it open first. (Logically owner of these things in
good for small companies) production)
SQL Command Categories

Data Query Data Definition Data Manipulation Data Control


Language Language Language Language

Create Table Insert Grant


Select

Alter Table Update Revoke

Drop Table Delete


Data Types in SQL
Defines the type of data the column holds

Integer Character
Different Data Types

1 Numeric

Character
2
3 Date & Time
Introduction
What is SQL?
✔ SQL (pronounced "ess-que-el") stands for Structured Query Language
✔ SQL is used to communicate with a database.
✔ It is the standard language for relational database management systems
✔ SQL statements are used to perform tasks such as update data on a database, or retrieve data from a
database. Some common relational database management systems that use SQL are: Oracle, Sybase,
Microsoft SQL Server, Access, Ingres, etc.
✔ The standard SQL commands such as "Select", "Insert", "Update", "Delete", "Create", and "Drop" can be
used to accomplish almost everything that one needs to do with a database.
✔ SQL programming can be used to perform multiple actions on data such as :
○ Querying
○ Inserting
○ Updating
○ Deleting
○ Extracting etc.
Primary & Foreign Key

A Primary key is used to ensure data in the specific column is Unique and Not Null

Primary key Unique Not Null

A foreign key is a column or group of columns in a relational database table that provides a link
between data in two tables. It is a column (or columns) that references a column (most often the
primary key) of another table
Primary & Foreign Key - Contd..
Primary key:
○ no null values
○ unique identification

Foreign key:
○ correspond to the values of the primary key in another table
SQL – First Step
Create a Database
Create a Table
Insert Table Data
Update Table Data
Alter Database
Alter Table
Alter Column
Duplicate a Database
Drop a Database
Database Backup
Restore a Database
Rename Database
TABLE BASICS
A relational database system contains one or more objects called tables. The data or information for the
database are stored in these tables. Tables are uniquely identified by their names and are comprised of
columns and rows. Columns contain the column name, data type, and any other attributes for the column.
Rows contain the records or data for the columns. Here is a sample table called "employee".
The LIKE pattern matching operator can also be used in the conditional selection of the where clause. Like is a
very powerful operator that allows you to select only rows that are "like" what you specify. The percent sign
"%" can be used as a wildcard to match any possible character that might appear before or after the
characters specified.

This SQL statement will match any first names that For example:
start with 'Er'. Strings must be in single quotes. select first, last, city
Or you can specify: from empinfo
where first LIKE 'Er%';

This statement will match any select first, last


last names that end in a 's'. from empinfo
where last LIKE '%s';

This will only select rows where the select * from empinfo
first name equals 'Eric' exactly where first = 'Eric';
SELECTING THE DATA

Sample Table: empinfo

first last id age city state

John Jones 99980 45 Payson Arizona

Mary Jones 99982 25 Payson Arizona

Eric Edwards 88232 32 San Diego California

Mary Ann Edwards 88233 32 Phoenix Arizona

Ginger Howell 98002 42 Cottonwood Arizona

Sebastian Smith 92001 23 Gila Bend Arizona

Gus Gray 22322 35 Bagdad Arizona

Mary Ann May 32326 52 Tucson Arizona

Erica Williams 32327 60 Show Low Arizona

Leroy Brown 32380 22 Pinetop Arizona

Elroy Cleaver 32382 22 Globe Arizona


Enter the following sample select first, last, city from empinfo;
select statements in the select last, city, age from empinfo
SQL Interpreter Form at the where age > 30;
bottom of this page. Before
select first, last, city, state from empinfo
you press "submit", write where first LIKE 'J%';
down your expected results.
select * from empinfo;
Press "submit", and
compare the results. select first, last, from empinfo
where last LIKE '%s';

select first, last, age from empinfo


where last LIKE '%illia%';

select * from empinfo where first = 'Eric';


CREATING TABLES
The create table statement is used to create a new table. Here is create table "tablename"
("column1" "data type",
the format of a simple create table statement:
"column2" "data type",
"column3" "data type");

create table "tablename"


("column1" "data type"
[constraint],
Format of create table if you were to use optional constraints: "column2" "data type"
[constraint],
"column3" "data type"
[constraint]);
[ ] = optional

Example:

create table employee


Note: You may have as many columns as you'd like, and the (first varchar(15),
constraints are optional. last varchar(20),
age number(3),
address varchar(30),
city varchar(20),
state varchar(20));
To create a new table,

❏ enter the keywords create table followed by the table name,


❏ followed by an open parenthesis,
❏ followed by the first column name,
❏ followed by the data type for that column,
❏ followed by any optional constraints, and followed by a closing parenthesis.

It is important to make sure you use an open parenthesis before the beginning table, and a closing
parenthesis after the end of the last column definition.

Make sure you separate each column definition with a comma. All SQL statements should end with a ";".
The table and column names;

● must start with a letter


● and can be followed by letters, numbers, or underscores
● not to exceed a total of 30 characters in length.

Do not use any SQL reserved keywords as names for tables or column names (such as "select", "create", "insert", etc).

Data types specify what the type of data can be for that particular column. If a column called "Last_Name", is to be used to
hold names, then that particular column should have a "varchar" (variable-length character) data type.
Here are the most common Data types:

char(size) Fixed-length character string. Size is specified in parenthesis. Max 255


bytes.

varchar(size Variable-length character string. Max size is specified in parenthesis.


)

number(size) Number value with a max number of column digits specified in


parenthesis.

date Date value

number(size, Number value with a maximum number of digits of "size" total, with a
maximum number of "d" digits to the right of the decimal.
d)
What are constraints?
When tables are created, it is common for one or more columns to have constraints associated with them. A constraint is
basically a rule associated with a column that the data entered into that column must follow.

- For example, a "unique" constraint specifies that no two records can have the same value in a particular column. They
must all be unique.
- The other two most popular constraints are "not null" which specifies that a column can't be left blank, and "primary
key". A "primary key" constraint defines a unique identification of each record (or row) in a table.

All of these and more will be covered in the future Advanced release of this Tutorial. Constraints can be entered in this SQL
interpreter, however, they are not supported in this Intro to SQL tutorial & interpreter. They will be covered and supported in the
future release of the Advanced SQL tutorial - that is, if "response" is good.

It's now time for you to design and create your own table. You will use this table throughout the rest of the tutorial. If you decide
to change or redesign the table, you can either drop it and recreate it or you can create a completely different one. The SQL
statement drop will be covered later.
Create Table Exercise
You have just started a new company. It is time to hire some employees. You will need to create a
table that will contain the following information about your new employees:

firstname, last name, title, age, and salary.

After you create the table, you should receive a small form on the screen with the appropriate column
names. If you are missing any columns, you need to double check your SQL statement and recreate
the table. Once it's created successfully, go to the "Insert" lesson.

IMPORTANT: When selecting a table name, it is important to select a unique name that no one else
will use or guess. Your table names should have an underscore followed by your initials and the digits
of your birth day and month.

For example, Tom Smith, who was born on November 2nd, would name his table
myemployees_ts0211 Use this convention for all of the tables you create. Your tables will remain on a
shared database until you drop them, or they will be cleaned up if they aren't accessed in 4-5 days
INSERT
The insert statement is used to insert or add a
insert into "tablename"
row of data into the table.
(first_column,...last_column)
To insert records into a table, enter the key words values (first_value,...last_value);
insert into followed by the table name, followed by
an open parenthesis, followed by a list of column
names separated by commas, followed by a closing Example:
parenthesis, followed by the keyword values,
insert into empinfo
followed by the list of values enclosed in
(first, last, id, age, city, state)
parenthesis. The values that you enter will be held
values ('Luke', 'Duke', 45454,
in the rows and they will match up with the column
'22', 'Hazard Co', 'Georgia');
names that you specify.
Note: All strings should be enclosed
Strings should be enclosed in single quotes,
between single quotes: 'string'
and numbers should not.
Insert statement exercise
It is time to insert data into your new employee table.
Your first three employees are the following:
Jonie Weber, Secretary, 28, 19500.00
Potsy Weber, Programmer, 32, 45300.00
Dirk Smith, Programmer II, 45, 75020.00
Enter these employees into your table first, and then insert at least 5 more of your own list of employees in the table.
After they're inserted into the table, enter select statements to:
Select all columns for everyone in your employee table.
Select all columns for everyone with a salary over 30000.
Select first and last names for everyone that's under 30 years old.
Select first name, last name, and salary for anyone with "Programmer" in their title.
Select all columns for everyone whose last name contains "ebe".
Select the first name for everyone whose first name equals "Potsy".
Select all columns for everyone over 80 years old.
Select all columns for everyone whose last name ends in "ith".
Create at least 5 of your own select statements based on specific information that you'd like to retrieve.
UPDATE
The update statement is used to update or change records that match a
specified criteria. This is accomplished by carefully constructing a where
clause.
Examples:
update "tablename"
set "columnname" = update phone_book
"newvalue"
set area_code = 623 update employee
[,"nextcolumn" =
"newvalue2"...] where prefix = 979; set age = age+1
where "columnname" where first_name='Mary' and
update phone_book
OPERATOR "value" last_name='Williams';
[and|or "column" set last_name = 'Smith',
OPERATOR "value"]; prefix=555, suffix=9292
[] = optional where last_name = 'Jones';
Update statement exercises
After each update, issue a select statement to verify your changes.

1. Jonie Weber just got married to Bob Williams. She has requested that her last name be updated to
Weber-Williams.
2. Dirk Smith's birthday is today, add 1 to his age.
3. All secretaries are now called "Administrative Assistant". Update all titles accordingly.
4. Everyone that's making under 30000 are to receive a 3500 a year raise.
5. Everyone that's making over 33500 are to receive a 4500 a year raise.
6. All "Programmer II" titles are now promoted to "Programmer III".
7. All "Programmer" titles are now promoted to "Programmer II".

Create at least 5 of your own update statements and submit them.


DELETING RECORDS
The delete statement is used to delete records or rows from
the table.
Examples: delete from "tablename"
where "columnname"
delete from employee;
OPERATOR "value"
Note: if you leave off the where clause, all records will be deleted! [and|or "column"
delete from employee OPERATOR "value"];
where lastname = 'May'; [ ] = optional

delete from employee


where firstname = 'Mike' or firstname = 'Eric';
To delete an entire record/row from a table, enter "delete from" followed by the
table name, followed by the where clause which contains the conditions to delete. If
you leave off the where clause, all records will be deleted.
DROP A TABLE
● The drop table command is used to delete a table
and all rows in the table.
● To delete an entire table including all of its rows,
issue the drop table command followed by the
drop table "tablename"
tablename. Example:
● drop table is different from deleting all of the
records in the table. drop table
● Deleting all of the records in the table leaves the myemployees_ts0211;
table including column and constraint information.
● Dropping the table removes the table definition as
well as all of its rows.
JOINS
● In order to sort a table, we need to use the
ORDER BY Clause
● In order to get the results sorted based on certain
select field1, field2 from
columns, we need to use this. Usage of asc or
tablename where <clause>
desc has to be defined to have the results in
order by field1
ascending or descending orders. Default value
being asc (ascending)
What are constraints?
● In most of the real world problems, we might need data from multiple tables, that’s
where Joins comes into picture.
JOINS - Right Join

● The RIGHT JOIN


keyword returns all SELECT a.first, a.last,
b.proj_name
records from the right
FROM empinfo a RIGHT
table (table2), and the JOIN PROJECT b
matched records from ON a.id = b.id;
the left table (table1).
JOINS - Left Join

● The LEFT JOIN


keyword returns all SELECT a.first, a.last,
b.proj_name
records from the left
FROM empinfo a LEFT JOIN
table (table1), and the PROJECT b
matched records from ON a.id = b.id;
the right table (table2).
The result is NULL from
the right side, if there is
no match.
JOINS - Inner Join

● The INNER Join


keyword selects SELECT a.first, a.last,
b.proj_name
records that have
FROM empinfo a INNER JOIN
matching values in both PROJECT b
tables ON a.id = b.id;
Alter Table

ALTER TABLE
tablename ALTER TABLE
CUST_DETAILS
ADD columnname
ADD AGE INT;
datatype;
ALTER TABLE
ALTER TABLE CUST_DETAILS
tablename DROP COLUMN AGE;
DROP COLUMN
columnname ;
Data Importing
1. Manual Importing
Data Importing
1. Importing through command line

● Open MySQL Workbench, Create a new database to store the tables you'll import (eg- FacilitySerivces) → Then create the table using CREATE query
● Copy the MySQL bin directory path: C:\Program Files\MySQL\MySQL Server 8.0\bin
● Go to the folder in command line by using: cd path
● Connect to MySQL database: mysql -u root -p (root is basically your username)
● If you are logged in successfully, then set the global variables by using below command so that the data can be imported from local computer folder.
○ mysql> SET GLOBAL local_infile = 1;
○ Query OK, 0 rows affected (0.00 sec)
○ (you've just instructed MySQL server to allow local file upload from your computer)
● Quit current server connection (mysql> quit)
● Load the file from CSV file to the MySQL database. In order to do this, please follow the commands: (We'll connect with the MySQL server again with the local-infile system
variable. This basically means you want to upload data into a file from a local machine)
○ mysql --local-infile=1 -u root -p (give password)
○ Show Databases; (It'll show all the databases in MySQL server.)
○ mysql> USE dbase; (makes the database that you had created in step 1 as default schema to use for the next sql scripts)

mysql> LOAD DATA LOCAL INFILE 'fullpath\\file.csv'


INTO TABLE tablename
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n' IGNORE 1 ROWS;

● Note: VERY IMP - Please replace single backward (\) slash in the path with double back slashes (\\) instead of single slash
Data Exporting
Server → Data Export
Aggregate Functions
✔ Sometimes we examine & analyse data of varying magnitudes, hence we realise
the need of grouping similar types of values together & look them at as one bunch.

For example: Consider a table containing data consisting of the marks


scored by students in their 12th board exams. While you would want to know
how the students performed in all the subjects put together, it is equally
important to see how they performed in each subject. You can gain even
further insights if you group these students by state. Hence, it is imperative that
you learn the usage of aggregate functions in your queries.

✔ groupby() → To aggregate values of a column C1 ‘grouped by’ a certain column


C2.

✔ count() → Count the number of rows.

✔ min(), max()→ Finding the minimum & maximum values for a particular column

For example: select min(score) from employee;

✔ avg() → Find the average


ORDERING
Quite often, you would want to display the retrieved records in a particular order, for example,
in increasing order of income, joining date, alphabetical order, etc. This is commonly useful
when you are making a report or presenting the data to someone else.

select firstName from


employees order by
firstName asc limit 3;
HAVING CLAUSE
You have already learnt how to filter individual values based on a given condition.
But how do you do this on grouped values?

Suppose your manager asks you to count all the employees whose salaries are more than the average
salary in that particular department.

Now, intuitively, you know that two aggregate functions would be used here, namely, count() and avg(). You
decide to apply the 'where' condition on the average salary of the department, but to your surprise, the query
fails. This is exactly what the having clause is for.

The 'having' clause is typically used when you have to apply a filter condition on an 'aggregated value'. This
is because the 'where' clause is applied before aggregation takes place, and thus, it is not useful when you
want to apply a filter on an aggregated value.

The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate
functions.
Exercise
Keywords in SQL

List the keywords below in order of their occurrence in a query:

1. group by
2. order by
3. select
4. where
5. from
6. limit
7. having

Answer is:
STRING FUNCTIONS
Used to manipulate the string data and make it more understandable for analysis.

For example: amitabhbachchan, or Amitabh Bachchan, which one of them is more readable, obviously the
later one right.

concat → Concatenates two strings


substr/substring → Extracts a substring from a string
upper → Converts a string to uppercase
lower → Converts a string to lowercase
character_length → Calculates the length of the string variable.
mid → find the middle elements → MID(str,pos,len)

https://www.w3schools.com/sql/sql_ref_mysql.asp
DATE & TIME FUNCTIONS
Used to manipulate the date & time columns

For example: If you want to change the date format, or just want to see the exact day, or so on.

Let’s create a transaction_details table and play around.

datediff → Return the number of days between the two date values
SELECT DATEDIFF(sysdate(), order_date) from transaction_details
date_format → Format a date variable
SELECT DATE_FORMAT("2017-06-15", "%Y");
day → Return the day of the month for a date.
SELECT DAY("2017-06-15");
quarter → Return the quarter of the year for a date.
SELECT QUARTER("2017-06-15");
addday → Add days to the date variable
SELECT ADDDATE("2017-06-15", INTERVAL 10 DAY);

and so on….
https://www.w3schools.com/sql/func_mysql_adddate.asp
REGEX
So far you have already known about the wildcards like “like” operator, but in cases wildcards may fall short
for some advanced use cases, regular expressions comes into picture.

Regex, or Regular expressions, is a sequence of characters, used to search and locate specific sequences
of characters that match a pattern.

Example 1: Match beginning of the name

select * from customer WHERE email REGEXP '^w'


REGEX
Example 2: Find the customers, which email address having a ‘z’, ‘v’ or ‘p’

SELECT * FROM customer


WHERE email REGEXP "[zvp]";

Example 3: Find the customers, which email address containing characters from ‘x’ to ‘z’

SELECT * FROM customer


WHERE email REGEXP "[x-z]";
NESTED QUERIES
By now, you know that a database is a
collection of multiple related tables. Now,
while generating insights from data, you SELECT lastName,firstName
may need to refer to these multiple tables FROM employees
in a query. There are two ways to deal with WHERE
office_Code IN (SELECT
such types of queries. These include the
office_Code
following:
FROM
1. Joins offices
2. Nested queries/Subqueries WHERE
country = 'USA');

A subquery is called an inner query while


the query that contains the subquery is
called an outer query. A subquery can be
used anywhere that expression is used and
must be closed in parentheses.
STORED PROCEDURES
A stored procedure is a prepared SQL code that
you can save, so the code can be reused over and
over again.
CREATE PROCEDURE
So if you have an SQL query that you write over SelectAllCustomers
and over again, save it as a stored procedure, and AS
then just call it to execute it. SELECT * FROM
Customers
You can also pass parameters to a stored
GO;
procedure, so that the stored procedure can act
based on the parameter value(s) that is passed.

.
TRIGGERS
A trigger is a stored procedure in database
which automatically invokes whenever a
create trigger [trigger_name]
special event in the database occurs. For [before | after]
example, a trigger can be invoked when a {insert | update | delete}
row is inserted into a specified table or on [table_name]
when certain table columns are being [for each row]
updated. [trigger_body]

● DDL Trigger
● DML Trigger
● Logon Trigger
VIEWS
Views are virtual tables that do not store any data
of their own but display data stored in other
tables. create view newEmp as SELECT
lastName,firstName
Advantages: FROM employees
WHERE
● Hide the complexity of data.
office_Code IN (SELECT
● Act as aggregated tables.
office_Code
● If you are doing an user level access FROM
control, you can give an user access to a offices
view without giving access to the tables WHERE
behind it country = 'USA');
● It can allow for massive performance
improvements.
VIEWS
Disadvantages:
● If done wrong, it can result in performance create view newEmp as SELECT
issues. lastName,firstName
● You may not be able to update the view, FROM employees
forcing you back to the original tables. WHERE
office_Code IN (SELECT
office_Code
FROM
offices
WHERE
country = 'USA');
VIEWS
WINDOW FUNCTION
Window functions applies aggregate and ranking
functions over a particular window (set of rows). SELECT coulmn_name1,
OVER clause is used with window functions to window_function(cloumn_name2),
OVER([PARTITION BY column_name1] [ORDER BY
define that window. OVER clause does two column_name3]) AS new_column
things : FROM table_name;
window_function= any aggregate or ranking function
● Partitions rows into form set of rows. column_name1= column to be selected
coulmn_name2= column on which window function is
(PARTITION BY clause is used) to be applied
column_name3= column on whose basis partition of
● Orders rows within those partitions into a
rows is to be done
particular order. (ORDER BY clause is new_column= Name of new column
table_name= Name of table
used)
WINDOW FUNCTION

● Row_number() → Gives a sequential integer to every row within


its partition
● Rank() functions → Ranking records
● First_value() functions → Returns the value of the specific
expression with respect to first row in the window frame
SQL-Python Connectivity

● pymysql → This package contains a pure-Python MySQL client library, based on PEP 249
→ pip install pymysql

dbcon = pymysql.connect(host="localhost",user= "root",password= "root",database= "breast_cancer")


SUMMARY
You learnt about the crux of SQL queries. These are the basic to intermediate level concepts of SQL that will
be tested rigorously in your interviews. Hence, practice these points again and again to gain a deep
understanding of the several SQL statements.

Leave your top three takeaways from this session in the comments section below.
Exercise
Write a query to retrieve the names of all employees who have an age greater than what Gus Gray has.

Answer is:
Exercise
Write a query to retrieve the names of all employees who have an age greater than what Gus
Gray has, and store the output in a view.

Answer is:
Exercise
Write a query to retrieve the names of all employees who have an age greater than what Gus Gray has, and
store the output in a view.

Answer is:

You might also like