ALBS - SAS Module 7
ALBS - SAS Module 7
Module - 7
Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011. Unauthorized use and/ or duplication of this
material or any part of this material including data, in any form without explicit and written permission from AnalytixLabs is
strictly prohibited. Any violation of this copyright will attract legal actions.
Learn to Evolve
Database
Database Terminology
3. Null Values - missing value. It is not the same as a blank or zero value.
4. PROC SQL executes without using the RUN statement and QUIT statement to terminate the
procedure.
Creating Table (1/2)
• Enables you to create tables without rows from column definitions or to create tables from a query
result
Syntax
Create table <table name>
(
<column 1> <data type> [not null] [unique] [<column constraint>],
.........
<column n> <data type> [not null] [unique] [<column constraint>],
);
where
• Table name is the name of the table
• Column 1 – column n are column headings
Creating Table (2/2)
• Not null is a constrain which specifies that the columns value should not be left blank
• Unique specifies that no two records can have the same value for this column. Unless the condition
not null is also specified for this column, the attribute value null is allowed and two records having
the attribute value null for this column do not violate the constraint
• Rules
– For each column, a name and a data type must be specified and the column name must be unique
within the table definition
Example:
• Creates a table called emp and the columns EMPNO and ENAME cannot have null values
Creating Tables from a Query Result
• Use a CREATE TABLE statement, and place it before the SELECT statement
• Data is derived from the table that is referenced in the query’s FROM clause
• New table’s column names are as specified in the query’s SELECT clause list
Syntax
Create table <table name1> as
Select <column name1>, <column name2>, ………..
From <table name2>;
where,
• Column name is the name of the column to be printed
• Table name2 is the name of the table from which the data is to be selected
• Table name 1 is the new table to be created
Creating Tables from a Query Result
Example:
Select statement
• Retrieves data from tables
Syntax:
Example:
proc sql ;
title ’U.S. Cities with Their States and Coordinates’;
select *
from day1.uscitycoords;
Amarillo TX 35 -102
Anchorage AK 61 -150
Annapolis MD 39 -77
Atlanta GA 34 -84
Retrieving Data
Example:
’U.S. Cities with Their States and Coordinates’
City State Latitude Longitude
------------------------------------------------
proc sql ; Albany NY 43 -74
title ’Names of U.S. Cities’;
select * Albuquerque NM 36 -106
from day1.uscitycoords; Amarillo TX 35 -102
Anchorage AK 61 -150
Annapolis MD 39 -77
• Use the keyword distinct in the SELECT clause, to display unique values in a table
• Eliminates duplicate rows, or rows in which the values in all of the columns match when all of a
table’s columns are specified in a SELECT clause with the DISTINCT keyword
Syntax:
Select distinct <variable name>,……..
from <table name>;
where,
• Variable name is the variable to be selected
Example:
Continents of the United States
Continent
proc sql;
-----------------------------------
title ’Continents of the United States’;
North America
select distinct Continent
Oceania
from day1.unitedstates;
quit;
• Unique values in the variable continent will be displayed
• Can create new columns that exist for the duration of the query
• Can perform calculations with values retrieved from numeric columns
Syntax
Select <column name1>, <column name2>, <expression> as
<column name3>,……...
From <table name>;
where,
• Column name is the name of the column to be printed
• Table name is the name of the table from which the data is to be selected
• Expression is any valid expression
Creating New Columns
Example:
proc sql ;
select City, (AvgHigh - 32) * 5/9 as HighC format=5.1,
(AvgLow - 32) * 5/9 as LowC format=5.1,
(calculated HighC - calculated LowC)
as Range format=4.1 City HighC LowC Range
from day1.worldtemps;
Algiers 32.2 7.2 25.0
CASE expressions
• Enables to interpret and change some or all of the data values in a column
• Use conditional logic within a query by using a CASE expression to conditionally assign a value
• Can use a CASE expression anywhere a column name can be used
Syntax
Select <column name1>, <column name2>,
case [<column name>]
when <condition1> then <value1>
when <condition2> then <value2>
………
else <value n> [as alias name]
From <table name>;
where
• Column name is the name of the column in which the condition is to be satisfied. It is optional
• Value1, value2 etc are the corresponding values
• Alias name is the name of the column in which the values are to be stored
Assigning Values Conditionally
Example:
when 67 le Latitude ge 23 then "North Temperate" Alice Springs Australia -24 South Temperate
when 23 gt Latitude gt -23 then "Torrid" Brisbane Australia -27 South Temperate
when -23 ge Latitude ge -67 then "South Buenos Aires Argentina -34 South Temperate
Temperate" Chittagong Bangladesh 22 Torrid
else "South Frigid"
Cordoba Argentina -31 South Temperate
end as ClimateZone
Darwin Australia -12 Torrid
from day1.worldcitycoords
order by City; Kabul Afghanistan 35 North Temperate
quit;
Assigning Values Conditionally
Example:
• Checks each of its arguments until it finds a non missing value, then returns that value
• If all of the arguments are missing values, then the COALESCE function returns a missing value
Syntax
Example:
proc sql;
select Name, coalesce (LowPoint, "Not Available") as LowPoint
from day1.continents;
Name LowPoint
Africa Lake Assal
Antarctica Not Available
Asia Dead Sea
Australia Lake Eyre
Central Not Available
Europe Caspian Sea
North America Death Valley
Oceania Not Available
South America Valdes Peninsul
Specifying Column Attributes
Syntax
Select <column name1> label = “<description>” format = <format
name>, <column name2> label = “<description>”,…..
From <table name>;
where
• Description is the string that is to be used as the label
• Format name is the name of the format to be used for that corresponding column
Specifying Column Attributes
Example:
proc sql ;
title ’Areas of U.S. States in Square Miles’;
select Name label="State", Area format=comma10.
from day1.unitedstates;
State Area
Alabama 52,423
Alaska 656,400
Arizona 114,000
Arkansas 53,200
California 163,700
Colorado 104,100
Connecticut 5,500
Delaware 2,500
Sorting Data
• Sort query results with an ORDER BY clause by specifying any of the columns in the table, including
unselected or calculated columns
• Can sort by more than one column by specifying the column names, separated by commas
• Default order is ascending and for sorting in descending order specify the key word ‘desc’ after the
variable name
• Can sort by any column within the SELECT clause by specifying its numerical position
Syntax
Select <column name1>, <column name2>,
From <table name>
order by <column name> [desc], <column name> …. ;
where
• column name is the column by which the data is to be sorted
Sorting Data
’Country Populations’
Name Population
Example: Andorra 64,634
proc sql ; Antigua 65,644
title ’Country Populations’; Barbados 258,534
select Name, Population format=comma10.
from day1.countries Bahamas 275,703
order by Population; Bahrain 591,800
Albania 3,407,400
Armenia 3,556,864
• Can use logical, or Boolean, operators to construct a WHERE clause that contains two or more
expressions
Syntax
where
• Condition is any valid expression
Retrieving Rows That Satisfy a Condition
Comparison Operators
Conditional Operators
Operator Definition
ANY specifies that at least one of a set of values obtained
from a sub query must satisfy a given condition
ALL specifies that all of the values obtained from a sub
query must satisfy a given condition
BETWEEN – AND tests for values within an inclusive range
• Use an aggregate function (or summary function) to produce a statistical summary of data in a table
• If multiple arguments are specified, then the arguments or columns that are listed are calculated
• Applies the function to the entire table, unless a GROUP BY clause is used
• Can use aggregate functions in the SELECT or HAVING clauses
• Combining, or rolling up, of rows occurs when
– The SELECT clause contains only columns that are specified within an aggregate function
– The WHERE clause, if there is one, contains only columns that are specified in the SELECT clause.
Syntax
Select <column name1>, aggr func(<column name2>),…….
From <table name>
where
• aggr func is any valid aggregate function
Summarizing Data
• Aggregate functions, such as the MAX function, can cause the same calculation to repeat for every row
– The SELECT clause references a column that contains an aggregate function and other column(s) that
are not listed in the GROUP BY clause.
– One or more columns or column expressions that are listed in a HAVING clause are not included in a
subquery or a GROUP BY clause.
• Sometimes need to use an aggregate function so that its results can be used in another calculation
Summarizing Data
• When a GROUP BY clause is used, should also use an aggregate function in the SELECT clause or in
a HAVING clause to instruct PROC SQL how to summarize the data for each group
• GROUP BY clause is used without an aggregate function, it is treated as if it were an ORDER BY clause
Syntax
where
• Column name is the column by which the data is to be grouped
Grouping Data
Example:
proc sql;
title ’Total Populations of World Continents’;
select Continent, sum(Population) format=comma14. as TotalPopulation
from day1.countries
where Continent is not missing
group by Continent;
Continent TotalPopulation
Africa 38,072,182
Asia 155,369,051
Australia 18,255,944
Central America 599,881
Europe 11,505,780
South America 34,248,705
Filtering Grouped Data
• If a HAVING clause is used without a GROUP BY clause, it is treated as if it were a WHERE clause
Syntax
Select <column name1>, <column name2>,…….
From <table name>
Group by <column name>
Having <condition>;
where
• Condition is an expression to select certain groups
Filtering Grouped Data
Example:
proc sql;
title ’Total Populations of Continents with More than
15 Countries’;
select Continent,
sum(Population) as TotalPopulation format=comma16.,
count(*) as Count
from day1.countries
group by Continent
having count(*) gt 1
order by Continent; ’Total Populations of Continents with More than 15 Countries’
Africa 38,072,182 2
Asia 155,369,051 5
Central America 599,881 3
Europe 11,505,780 3
Selecting Data from More Than One Table by Using Joins
• Joining tables enables you to select data from multiple tables as if the data were contained in one table
Cartesian Product
• Basic type of join is simply list two tables in the FROM clause of a SELECT statement
• Returns the Cartesian product of the tables i.e., Each row from the first table is combined with every row
from the second table
Syntax
Select <column name1>, <column name2>,…….
From <table name1> <table name2>;
where
• Table name1, table name2 are the tables for join
Different Types of Joins
42
Selecting Data from More Than One Table by Using Joins
Example:
proc sql ;
title ’Oil Production/Reserves of Countries’;
select * from day1.oilprod , day1.oilrsvrs ;
Inner Joins
• Inner join returns only the subset of rows from the first table that matches rows from the second table
• Can specify the columns that you want to be compared for matching values in a WHERE clause or use
the keyword inner join
Syntax
where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins
Example:
proc sql ;
title ’Oil Production/Reserves of Countries’;
select * from day1.oilprod as p inner join day1.oilrsvrs as r
on p.country = r.country;
Outer Joins
• Output includes rows that match and rows that do not match from the join’s source tables
• Non matching rows have null values in the columns from the unmatched table
Syntax
Select <column name1>, <column name2>,…….
From <table name1> as alias1 left join <table name2> as alias
on alias1.column = alias2.column;
where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins
Example:
proc sql ;
title ’Coordinates of Capital Cities’;
select Capital format=$20., Name "Country" format=$20.,
Latitude, Longitude
from day1.countries a left join day1.worldcitycoords b
on a.Capital = b.City and
a.Name = b.Country
order by Capital;
• Non matching rows from the right-hand table (the second table listed in the FROM clause) are included
with all matching rows in the output
Syntax
where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins
Example:
Syntax
where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins
Example:
Syntax
Select <column name1>, <column name2>,…….
From <table name1>
where <column name> = (Select <column name1>,…….
From <table name2>
where <condition>) ;
where
• Column name is the column in which the value returned by the inner query is to be checked with
Using Subqueries to Select Data
Example:
Example:
proc sql;
create index area
on day1.countries(area);
proc sql;
create index places
on day1.countries(name, continent);
Indexes