0% found this document useful (0 votes)
57 views56 pages

ALBS - SAS Module 7

ALBS - SAS Module 7

Uploaded by

prashant200690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views56 pages

ALBS - SAS Module 7

ALBS - SAS Module 7

Uploaded by

prashant200690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Database & Data Mining using SQL

Module - 7
Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011. Unauthorized use and/ or duplication of this
material or any part of this material including data, in any form without explicit and written permission from AnalytixLabs is
strictly prohibited. Any violation of this copyright will attract legal actions.

Learn to Evolve
Database

• A database is an organized collection of information

• Common examples of a database would be a telephone book, mailing list, etc.

Database Terminology

• Table - a list of related information presented in a column/row format

• Row is referred to as a Record

• Column in a table is a category of information referred to as a Field


Database
PROC SQL
Proc SQL

• SQL procedure is implementation of sql in SAS


• Structured Query language - language that retrieves and updates data in relational tables and
databases
• Proc SQL can be used with any SAS data set (table)
• SAS language elements such as global statements (title, options etc), data set options, functions,
informats, and formats can be used with SQL
PROC SQL can
• Generate reports
• Generate summary statistics
• Retrieve data from tables or views
• Combine data from tables or views
• Create tables, views, and indexes
• Update the data values in PROC SQL tables
• Update and retrieve data from database management system (DBMS) tables
• Modify a PROC SQL table by adding, modifying, or dropping columns.
Terminology

1. Queries - retrieve data from a table, view, or DBMS (select statements)

2. Views - do not actually contain data as tables do (select statements)

3. Null Values - missing value. It is not the same as a blank or zero value.

4. PROC SQL executes without using the RUN statement and QUIT statement to terminate the
procedure.
Creating Table (1/2)

CREATE TABLE statement

• Enables you to create tables without rows from column definitions or to create tables from a query
result

Syntax
Create table <table name>
(
<column 1> <data type> [not null] [unique] [<column constraint>],
.........
<column n> <data type> [not null] [unique] [<column constraint>],
);

where
• Table name is the name of the table
• Column 1 – column n are column headings
Creating Table (2/2)

• Data type is the type of data to be stored in that column

• Not null is a constrain which specifies that the columns value should not be left blank

• Unique specifies that no two records can have the same value for this column. Unless the condition
not null is also specified for this column, the attribute value null is allowed and two records having
the attribute value null for this column do not violate the constraint

• Rules
– For each column, a name and a data type must be specified and the column name must be unique
within the table definition

– Column definitions are separated by comma

– Not case sensitive


Example

Example:

Create table EMP (


EMPNO number(4) not null,
ENAME varchar2(30) not null,
JOB varchar2(10),
MGR number(4),
HIREDATE date,
SAL number(7,2),
DEPTNO number(2)
);

• Creates a table called emp and the columns EMPNO and ENAME cannot have null values
Creating Tables from a Query Result

• Use a CREATE TABLE statement, and place it before the SELECT statement
• Data is derived from the table that is referenced in the query’s FROM clause
• New table’s column names are as specified in the query’s SELECT clause list
Syntax
Create table <table name1> as
Select <column name1>, <column name2>, ………..
From <table name2>;
where,
• Column name is the name of the column to be printed
• Table name2 is the name of the table from which the data is to be selected
• Table name 1 is the new table to be created
Creating Tables from a Query Result

Example:

proc sql outobs=10;


title ’Densities of Countries’;
create table day1.densities as
select Name "Country" format $15.,
Population format=comma10.0,
Area as SquareMiles,
Population/Area format=6.2 as Density
from day1.countries;
Retrieving Data

Select statement
• Retrieves data from tables

• SELECT statement must contain a SELECT clause and a FROM clause

Syntax:

Select <column name1>, <column name2>, ………..


From <table name>;
where,
• Column name is the name of the column to be printed
• Table name is the name of the table from which the data is to be selected
Retrieving Data

Example:

proc sql ;
title ’U.S. Cities with Their States and Coordinates’;
select *
from day1.uscitycoords;

’U.S. Cities with Their States and Coordinates’

City State Latitude Longitude


Albany NY 43 -74
Albuquerque NM 36 -106

Amarillo TX 35 -102
Anchorage AK 61 -150
Annapolis MD 39 -77
Atlanta GA 34 -84
Retrieving Data

Example:
’U.S. Cities with Their States and Coordinates’
City State Latitude Longitude
------------------------------------------------
proc sql ; Albany NY 43 -74
title ’Names of U.S. Cities’;
select * Albuquerque NM 36 -106
from day1.uscitycoords; Amarillo TX 35 -102
Anchorage AK 61 -150
Annapolis MD 39 -77

• In the above example, * is used to select all variables in the dataset.

U.S. Cities and Their States


proc sql ; City State
title ’U.S. Cities and Their States’;
select City, State -------------------------
from day1.uscitycoords; Albany NY
Albuquerque NM
Amarillo TX
• In this example, only two variables will be printed.
Anchorage AK
Annapolis MD
Selecting Unique Values

• Use the keyword distinct in the SELECT clause, to display unique values in a table

• Eliminates duplicate rows, or rows in which the values in all of the columns match when all of a
table’s columns are specified in a SELECT clause with the DISTINCT keyword

Syntax:
Select distinct <variable name>,……..
from <table name>;
where,
• Variable name is the variable to be selected

• Table name is the table from which it is to be selected


Selecting Unique Values

Example:
Continents of the United States
Continent
proc sql;
-----------------------------------
title ’Continents of the United States’;
North America
select distinct Continent
Oceania
from day1.unitedstates;
quit;
• Unique values in the variable continent will be displayed

proc sql; ’Continents of the United States’


title ’Continents of the United States’; state date population
select distinct *
from day1.newstates;
------------------------------------------------
• Unique rows in the table will be displayed New York 02JAN2008 4000000
New York 02JAN2008 5000000
Nwe York 01JAN2008 .
Creating New Columns

• Can create new columns that exist for the duration of the query
• Can perform calculations with values retrieved from numeric columns
Syntax
Select <column name1>, <column name2>, <expression> as
<column name3>,……...
From <table name>;
where,
• Column name is the name of the column to be printed
• Table name is the name of the table from which the data is to be selected
• Expression is any valid expression
Creating New Columns

’Low Temperatures in Celsius’


City
Example:
Algiers 7.2
proc sql ; Amsterdam 0.6
title ’Low Temperatures in Celsius’; Athens 5.0
select City, (AvgLow - 32) * 5/9 format=4.1
Auckland 6.7
from day1.worldtemps;
Bangkok 20.6

’Low Temperatures in Celsius’


City LowCelsius
proc sql ; Algiers 7.2
title ’Low Temperatures in Celsius’;
Amsterdam 0.6
select City, (AvgLow - 32) * 5/9 as LowCelsius
format=4.1 Athens 5.0
from day1.worldtemps; Auckland 6.7
Bangkok 20.6
Creating New Columns

Referring to a Calculated Column by Alias

Example:

proc sql ;
select City, (AvgHigh - 32) * 5/9 as HighC format=5.1,
(AvgLow - 32) * 5/9 as LowC format=5.1,
(calculated HighC - calculated LowC)
as Range format=4.1 City HighC LowC Range
from day1.worldtemps;
Algiers 32.2 7.2 25.0

Amsterdam 21.1 0.6 20.6

Athens 31.7 5.0 26.7


Auckland 23.9 6.7 17.2

Bangkok 35.0 20.6 14.4


Assigning Values Conditionally

CASE expressions
• Enables to interpret and change some or all of the data values in a column
• Use conditional logic within a query by using a CASE expression to conditionally assign a value
• Can use a CASE expression anywhere a column name can be used
Syntax
Select <column name1>, <column name2>,
case [<column name>]
when <condition1> then <value1>
when <condition2> then <value2>
………
else <value n> [as alias name]
From <table name>;
where
• Column name is the name of the column in which the condition is to be satisfied. It is optional
• Value1, value2 etc are the corresponding values
• Alias name is the name of the column in which the values are to be stored
Assigning Values Conditionally

Example:

proc sql ; ’Climate Zones of World Cities’


title ’Climate Zones of World Cities’;
City Country Latitude ClimateZone
select City, Country, Latitude,
Adelaide Australia -35 South Temperate
case
when Latitude gt 67 then "North Frigid" Algiers Algeria 37 North Temperate

when 67 le Latitude ge 23 then "North Temperate" Alice Springs Australia -24 South Temperate
when 23 gt Latitude gt -23 then "Torrid" Brisbane Australia -27 South Temperate
when -23 ge Latitude ge -67 then "South Buenos Aires Argentina -34 South Temperate
Temperate" Chittagong Bangladesh 22 Torrid
else "South Frigid"
Cordoba Argentina -31 South Temperate
end as ClimateZone
Darwin Australia -12 Torrid
from day1.worldcitycoords
order by City; Kabul Afghanistan 35 North Temperate

quit;
Assigning Values Conditionally

Example:

proc sql ; ’Assigning Regions to Continents’


title ’Assigning Regions to Continents’;
select Name, Continent, Name Continent Region
case Continent
when "North America" then "Continental U.S." Alabama North America Continental U.S.
when "Oceania" then "Pacific Islands"
Alaska North America Continental U.S.
else "None"
end as Region Arizona North America Continental U.S.
from day1.unitedstates;
Arkansas North America Continental U.S.

California North America Continental U.S.

Colorado North America Continental U.S.

Connecticut North America Continental U.S.


Replacing Missing Values

Using Coalesce Function


• Enables to replace missing values in a column with a new value

• Checks each of its arguments until it finds a non missing value, then returns that value

• If all of the arguments are missing values, then the COALESCE function returns a missing value

Syntax

Select <column name1>, <column name2>,


coalesce(<column name>, <value>),…..
From <table name>;
where
• Column name is the name of the column which is to be checked for missing values
• Value is the value to be used for replacing missing values
Replacing Missing Values

Example:

proc sql;
select Name, coalesce (LowPoint, "Not Available") as LowPoint
from day1.continents;

Name LowPoint
Africa Lake Assal
Antarctica Not Available
Asia Dead Sea
Australia Lake Eyre
Central Not Available
Europe Caspian Sea
North America Death Valley
Oceania Not Available
South America Valdes Peninsul
Specifying Column Attributes

• Determine how SAS data is displayed


• If not specified, then PROC SQL uses attributes that are already saved in the table or, if no
attributes are saved, then it uses the default attributes
• Different attributes
– Format
– Informat
– Label
– Length

Syntax
Select <column name1> label = “<description>” format = <format
name>, <column name2> label = “<description>”,…..
From <table name>;

where
• Description is the string that is to be used as the label
• Format name is the name of the format to be used for that corresponding column
Specifying Column Attributes

Example:

proc sql ;
title ’Areas of U.S. States in Square Miles’;
select Name label="State", Area format=comma10.
from day1.unitedstates;

State Area
Alabama 52,423
Alaska 656,400
Arizona 114,000
Arkansas 53,200
California 163,700
Colorado 104,100
Connecticut 5,500
Delaware 2,500
Sorting Data

• Sort query results with an ORDER BY clause by specifying any of the columns in the table, including
unselected or calculated columns

• Can sort by more than one column by specifying the column names, separated by commas

• Default order is ascending and for sorting in descending order specify the key word ‘desc’ after the
variable name

• Can sort by any column within the SELECT clause by specifying its numerical position

• Sorts nulls, or missing values, before character or numeric data

Syntax
Select <column name1>, <column name2>,
From <table name>
order by <column name> [desc], <column name> …. ;

where
• column name is the column by which the data is to be sorted
Sorting Data

’Country Populations’
Name Population
Example: Andorra 64,634
proc sql ; Antigua 65,644
title ’Country Populations’; Barbados 258,534
select Name, Population format=comma10.
from day1.countries Bahamas 275,703
order by Population; Bahrain 591,800
Albania 3,407,400
Armenia 3,556,864

’World Topographical Features’


Name Type
Angel Falls Waterfall
proc sql ;
title ’World Topographical Features’; Andaman Sea
select Name, Type Baltic Sea
from day1.features
order by Type desc, Name; Bering Sea
Black Sea
Amazon River
Amur River
Retrieving Rows That Satisfy a Condition

Using WHERE clause

• Enables to retrieve only the rows that satisfy a condition

• Can contain any of the columns in a table, including unselected columns

• Can use logical, or Boolean, operators to construct a WHERE clause that contains two or more
expressions

Syntax

Select <column name1>, <column name2>,…….


From <table name>
where <condition>;

where
• Condition is any valid expression
Retrieving Rows That Satisfy a Condition

Example: ’Countries in Europe’


Name Population
proc sql ;
title ’Countries in Europe’; Albania 3,407,400
select Name, Population format=comma10. Andorra 64,634
from day1.countries
where Continent = "Europe"; Austria 8,033,746

’Countries in Europe’ with


proc sql; population > 100000
title ’Countries in Africa with Populations over 100000’;
Name Population
select Name, Population format=comma10.
from day1.countries
Algeria 28,171,132
where Continent = "Africa" and Population gt 100000
order by Population desc; Angola 9,901,050
Retrieving Rows That Satisfy a Condition

Comparison Operators

Logical (Boolean) Operators


Retrieving Rows That Satisfy a Condition

Conditional Operators
Operator Definition
ANY specifies that at least one of a set of values obtained
from a sub query must satisfy a given condition
ALL specifies that all of the values obtained from a sub
query must satisfy a given condition
BETWEEN – AND tests for values within an inclusive range

CONTAINS tests for values that contain a specified string

EXISTS tests for the existence of a set of values obtained from


a sub query
IN tests for values that match one of a list of values

IS NULL or IS MISSING tests for missing values

LIKE tests for values that match a specified pattern1

=* tests for values that sound like a specified value


Summarizing Data

• Use an aggregate function (or summary function) to produce a statistical summary of data in a table
• If multiple arguments are specified, then the arguments or columns that are listed are calculated
• Applies the function to the entire table, unless a GROUP BY clause is used
• Can use aggregate functions in the SELECT or HAVING clauses
• Combining, or rolling up, of rows occurs when
– The SELECT clause contains only columns that are specified within an aggregate function
– The WHERE clause, if there is one, contains only columns that are specified in the SELECT clause.

Syntax
Select <column name1>, aggr func(<column name2>),…….
From <table name>

where
• aggr func is any valid aggregate function
Summarizing Data

Example: ’Mean Temperatures for World Cities’


City Country MeanTemp
proc sql ;
title ’Mean Temperatures for World Cities’; Bangkok Thailand 82
select City, Country, mean(AvgHigh,avglow) Bombay India 79
as MeanTemp
from day1.worldtemps Calcutta India 76.5
where calculated MeanTemp gt 60 Cairo Egypt 71.5
order by MeanTemp desc;

proc sql; ’World Oil Reserves’


title ’World Oil Reserves’;
select sum(Barrels) format=comma18. as TotalBarrels TotalBarrels
from day1.oilrsvrs;
713,300,000,000
Summarizing Data

• Aggregate functions, such as the MAX function, can cause the same calculation to repeat for every row

• This occurs when


– The SELECT clause references a column that contains an aggregate function that is not listed in a
GROUP BY clause.

– The SELECT clause references a column that contains an aggregate function and other column(s) that
are not listed in the GROUP BY clause.

– One or more columns or column expressions that are listed in a HAVING clause are not included in a
subquery or a GROUP BY clause.

• Sometimes need to use an aggregate function so that its results can be used in another calculation
Summarizing Data

Example: ’Largest Country Populations’


Name Population MaxPopulation
proc sql ; Bangladesh 126,390,000 126,390,000
title ’Largest Country Populations’;
select Name, Population format=comma20., Argentina 34,248,705 126,390,000
max(Population) as MaxPopulation format=comma20. Algeria 28,171,132 126,390,000
from day1.countries Australia 18,255,944 126,390,000
order by Population desc;
Afghanistan 17,070,323 126,390,000

’Percentage of World Population in Countries’


proc sql ; Name Population Percentage
title ’Percentage of World Population in Countries’; Bangladesh 126,390,000 48.98
select Name, Population format=comma14.,
Argentina 34,248,705 13.27
(Population / sum(Population) * 100) as Percentage
format=comma8.2 Algeria 28,171,132 10.92
from day1.countries Australia 18,255,944 7.07
order by Percentage desc; Afghanistan 17,070,323 6.62
Angola 9,901,050 3.84
Grouping Data

• GROUP BY clause groups data by a specified column or columns

• When a GROUP BY clause is used, should also use an aggregate function in the SELECT clause or in
a HAVING clause to instruct PROC SQL how to summarize the data for each group

• GROUP BY clause is used without an aggregate function, it is treated as if it were an ORDER BY clause

Syntax

Select <column name1>, <column name2>,…….


From <table name>
Group by <column name>;

where
• Column name is the column by which the data is to be grouped
Grouping Data

Example:

proc sql;
title ’Total Populations of World Continents’;
select Continent, sum(Population) format=comma14. as TotalPopulation
from day1.countries
where Continent is not missing
group by Continent;

’Total Populations of World Continents’

Continent TotalPopulation
Africa 38,072,182
Asia 155,369,051
Australia 18,255,944
Central America 599,881
Europe 11,505,780
South America 34,248,705
Filtering Grouped Data

• Use a HAVING clause with a GROUP BY clause to filter grouped data

• Displays only the groups that satisfy the HAVING expression

• If a HAVING clause is used without a GROUP BY clause, it is treated as if it were a WHERE clause

Syntax
Select <column name1>, <column name2>,…….
From <table name>
Group by <column name>
Having <condition>;

where
• Condition is an expression to select certain groups
Filtering Grouped Data

Example:

proc sql;
title ’Total Populations of Continents with More than
15 Countries’;
select Continent,
sum(Population) as TotalPopulation format=comma16.,
count(*) as Count
from day1.countries
group by Continent
having count(*) gt 1
order by Continent; ’Total Populations of Continents with More than 15 Countries’

Continent TotalPopulation Count

Africa 38,072,182 2
Asia 155,369,051 5
Central America 599,881 3
Europe 11,505,780 3
Selecting Data from More Than One Table by Using Joins

• Joining tables enables you to select data from multiple tables as if the data were contained in one table

Cartesian Product

• Basic type of join is simply list two tables in the FROM clause of a SELECT statement

• Returns the Cartesian product of the tables i.e., Each row from the first table is combined with every row
from the second table

Syntax
Select <column name1>, <column name2>,…….
From <table name1> <table name2>;

where
• Table name1, table name2 are the tables for join
Different Types of Joins

42
Selecting Data from More Than One Table by Using Joins

Example:

proc sql ;
title ’Oil Production/Reserves of Countries’;
select * from day1.oilprod , day1.oilrsvrs ;

’Oil Production/Reserves of Countries’

Country PerDay Country Barrels


Algeria 1400000 Algeria 9.2E9
Canada 2500000 Algeria 9.2E9
China 3000000 Algeria 9.2E9
Egypt 900000 Algeria 9.2E9
Selecting Data from More Than One Table by Using Joins

Inner Joins
• Inner join returns only the subset of rows from the first table that matches rows from the second table
• Can specify the columns that you want to be compared for matching values in a WHERE clause or use
the keyword inner join

Syntax

Select <column name1>, <column name2>,…….


From <table name1> as alias1 inner join <table name2> as alias
on alias1.column = alias2.column;

where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins

Example:

proc sql ;
title ’Oil Production/Reserves of Countries’;
select * from day1.oilprod as p inner join day1.oilrsvrs as r
on p.country = r.country;

’Oil Production/Reserves of Countries’


Country PerDay Country Barrels
Algeria 1400000 Algeria 9.2E9
Canada 2500000 Canada 7E9
China 3000000 China 2.5E10
Egypt 900000 Egypt 4E9
Selecting Data from More Than One Table by Using Joins

Outer Joins
• Output includes rows that match and rows that do not match from the join’s source tables
• Non matching rows have null values in the columns from the unmatched table

Left Outer Join


• Left outer join lists matching rows and rows from the left-hand table (the first table listed in the FROM
clause) that do not match any row in the right-hand table

Syntax
Select <column name1>, <column name2>,…….
From <table name1> as alias1 left join <table name2> as alias
on alias1.column = alias2.column;
where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins

Example:

proc sql ;
title ’Coordinates of Capital Cities’;
select Capital format=$20., Name "Country" format=$20.,
Latitude, Longitude
from day1.countries a left join day1.worldcitycoords b
on a.Capital = b.City and
a.Name = b.Country
order by Capital;

’Coordinates of Capital Cities’


Capital Country Latitude Longitude
Algiers Algeria 37 3
Andorra la Vell Andorra . .
Baku Azerbaijan . .
Bridgetown Barbados . .
Buenos Aires Argentina -34 -59
Canberra Australia . .
Selecting Data from More Than One Table by Using Joins

Right Outer Join

• Non matching rows from the right-hand table (the second table listed in the FROM clause) are included
with all matching rows in the output

Syntax

Select <column name1>, <column name2>,…….


From <table name1> as alias1 right join <table name2> as alias
on alias1.column = alias2.column;

where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins

Example:

proc sql outobs=10;


title ’Populations of Capitals Only’;
select City format=$20., Country "Country" format=$20.,
Population
from day1.countries right join day1.worldcitycoords
on Capital = City and
Name = Country
order by City;

’Populations of Capitals Only’


City Country Population
Adelaide Australia .
Algiers Algeria 28171132
Alice Springs Australia .
Brisbane Australia .
Buenos Aires Argentina 34248705
Chittagong Bangladesh .
Cordoba Argentina .
Selecting Data from More Than One Table by Using Joins

Full Outer Join


• Selects all matching and non matching rows

Syntax

Select <column name1>, <column name2>,…….


From <table name1> as alias1 full join <table name2> as alias
on alias1.column = alias2.column;

where
• Alias1, alias2 are the alias names given for abbreviating table names, to make it easy to qualify column
names
Selecting Data from More Than One Table by Using Joins

Example:

proc sql outobs=10; ’Populations and/or Coordinates of World Cities’


title ’Populations and/or Coordinates of World City Capital Population Latit Longi
Cities’; (WORLDCITYCO (COUNTRI ude tude
select City "#City#(WORLDCITYCOORDS)" ORDS) ES)
format=$20., Adelaide . -35 138
Capital "#Capital#(COUNTRIES)" format=$20.,
Algiers Algiers 28171132 37 3
Population, Latitude, Longitude
from day1.countries full join day1.worldcitycoords Alice Springs . -24 134
on Capital = City and Andorra la 64634 . .
Name = Country; Vell
Baku 7760064 . .
Bridgetown 258534 . .
Brisbane . -27 153
Using Subqueries to Select Data

• A subquery, or inner query, is a query-expression that is nested as part of another query-expression


• A subquery (enclosed in parentheses) selects rows from one table based on values in another table
• Can return a single value or multiple values
• Most often used in the WHERE and the HAVING expressions

Syntax
Select <column name1>, <column name2>,…….
From <table name1>
where <column name> = (Select <column name1>,…….
From <table name2>
where <condition>) ;

where
• Column name is the column in which the value returned by the inner query is to be checked with
Using Subqueries to Select Data

Example:

proc sql; ’U.S. States with Population Greater than


title ’U.S. States with Population Greater than Austria’; Austria’
select Name "State" , population format=comma10.
from day1.unitedstates State Population
where population gt
(select population from day1.countries California 31,518,948
where name = "Austria"); Florida 13,814,408
Illinois 11,813,091

Populations of Major Oil


Producing Countries
proc sql outobs=5;
title ’Populations of Major Oil Producing Countries’; Country Population
select name ’Country’, Population format=comma15. -------------------------------------------------
from day1.countries Algeria 28,171,132
where Name in Canada 28,392,302
(select Country from sql.oilprod); China 1,202,215,077
Egypt 59,912,259
Indonesia 202,393,859
Indexes

• An index is a file that is associated with a table


• Indexes can provide quick access to small subsets of data
• The name of a simple index must be the same as the name of the column that it indexes
• To ensure that each value of the indexed column (or each combination of values of the columns in a
composite index) is unique, use the UNIQUE keyword
• An index is updated each time the rows in a table are added, deleted, or modified
Syntax
Create index <index name>
on <table name>(<column name>);
where,
• Index name is the name of the index
• Table name is the name of the table for which the index is to be created
• Column name is the name of the column based on which the index is to be created
Indexes

Example:

proc sql;
create index area
on day1.countries(area);

proc sql;
create index places
on day1.countries(name, continent);
Indexes

Tips for Creating and Using Indexes


• The name of the composite index cannot be the same as the name of one of the columns in the table.
• If you use two columns to access data regularly, such as a first name column and a last name column
from an employee database, then you should create a composite index for the columns.
• Keep the number of indexes to a minimum to reduce disk space and update costs.
• Use indexes for queries that retrieve a relatively small number of rows (less than 15%).
• In general, indexing a small table does not result in a performance gain.
• In general, indexing on a column with a small number (less than 6 or 7) of distinct values does not
result in a performance gain.
• Can use the same column in a simple index and in a composite index. However, for tables that have a
primary key integrity constraint, do not create more than one index that is based on the same column
as the primary key

You might also like