SlideShare a Scribd company logo
::IBM Informix indexing techniques:
which one to use when ?
Eric Vercelletto Session A12
Begooden IT Consulting 4/23/2013 3:35 PM
• Introduction to Response Time measuring
• Identify the relevant indexing techniques
• Describe implementation method
• Confirm/recognize its use by accurate monitoring
• Measure its efficency as response time and
effective use in the database (sqltrace,sqexplain)
• Identify pros and cons
Agenda / methodology
4/24/2013 Session F12 2
Introduction
• Begooden IT Consulting is an IBM ISV company, mainly
focused on Informix technology services.
• Our 15+ years experience within Informix Software
France and Portugal helped us to acquire in depth
product knowledge as well as solid field experience.
• Our services include Informix implementation auditing,
performance tuning, issue management, administration
mentoring …
• We also happen to be the Querix reseller for France and
French speaking countries (except Québec and Louisiana)
• The company is based in Pont l’Abbé, Finistère, France
4/24/2013
3
Some basics not to forget about
There are 2 ways to measure response times
• The « cold » measure: the response time is measured just after
starting the engine, when data and index pages are not yet loaded
into Shared Memory IFMX buffers. Disk IO must be performed to
read the data and index pages, which will increase the RT.
• The « hot » measure: RT is measured when data and index pages
are loaded into SHMEM. No or few disk IO => RT is much shorter.
• This point can often explain surprising RT differences according to
how the data accessed.
• Broad range or DS queries most often access data and/or indexes in
disk pages
• OLTP queries mostly access data and indexes in SHMEM pages
4
Derivated thoughts and facts
• Reading data pages and/or index pages on disk always take
longer than in SHMEM. Full table scans can take minutes or
more, according to table size
• Reading data pages in SHMEM is very fast. Full scan of a
table in SHMEM take fractions of seconds or seconds, rarely
more.
• Reading index pages in SHMEM is also very fast. Added to
this, due to the B TREE structure, reading index pages
generally handles more contents than reading data pages.
• This often makes difficult the comparison of the efficiency
of 2 different indexes on the same table, when reading in
SHMEM.
5
Derivated thoughts and facts (continued)
• When running hot measures on indexes, the differences
can be as low as milliseconds BUT …
• Repeating millions of times 3 unuseful milliseconds can
make a difference!
• When the Response Times get to such a low level, sqltrace
is the tool you need to understand the query behaviour.
• In certain situations, saving milliseconds on a query will
make the difference. In other situations, saving seconds will
not make the difference.
• A bad response time can be caused by an unappropriate
indexation, but can also be caused by some « unusual »
logic adding unuseful efforts to be performed by the
applications and the server.
6
Comparing cold measure with hot measure (1)
• full scan of a mid-sized table tpcc:order_line,
containing 24 millions of rows
se l e ct * from order_line
on s t at -g his output
« Cold » read: performed just after oninit -v
« Hot read: performed after the first scan
Many disk pages read
zero disk pages read47.4 secs 19,4 secs secs
All buffer reads
7
Comparing cold measure with hot measure (2)
• Cold use of a poor selectivity index
select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values)
Cold read Hot read
Few disk readsMany disk reads
Execution time: 5,9 secs Execution time: 1.1 secs
8
BATCHEDREAD_INDEX: description
• This feature has been taken from XPS and
introduced in 11.50xC5.
• The purpose is to maximize the index keys access
by grouping the reading of many index keys into
large buffers, then fetching the rows associated
with those keys
• This technique brings strong savings in terms of
CPU and IO, therefore reducing Response Time.
• This technique is suitable and efficient for
massive index reads (DS/OLAP), not for pinpoint-
type (OLTP) index access.
9
BATCHEDREAD_INDEX: the test
• We will run the following query against a 30
millions rows clients table. The table has an
index on ‘lastname’. Row size is 328 bytes
output to /dev/null
select lastname,count(*)
from clients
group by 1
• This query returns 2,188,286 rows
10
BATCHEDREAD_INDEX: facts
• All those response times are measured as « cold »
AUTO_READAHEAD 0
BATCHEDREAD_INDEX 0
• AUTO_READAHEAD 0
BATCHEDREAD_INDEX 1
• AUTO_READAHEAD 1
BATCHEDREAD_INDEX 1
See the difference
11
BATCHEDREAD_INDEX: how ?
• BATCHEDREAD_INDEX can be set, as well as
BATCHEDREAD_TABLE, either in the onconfig file
• Or used as an environment variable before
launching the application
export IFX_BATCHEDREAD_INDEX=1
• Or as an SQL statement
SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1';
• Monitor index scan activity with onstat –g scn
•
12
Attached or Detached Index?
• The « Antique Informix Disk Layout » used to create the index pages in the same
extents as the data pages for the attached indexes. The expected result was
reducing disk IO.
• This layout happened to become a problem because the data pages were often
located far from the index pages, causing the opposite effect of increasing disk IO.
The official recommandation was at this time to create detached indexes for this
reason.
• Nowadays, index pages are created in a different partition than the data pages,
causing the attached indexes to have the same level of performance as the
detached indexes.
• But.. If you have the possibility to create the data dbspaces and the index
dbspaces on independant disks and channels , you will increase your disk IO
performance by reducing disk contention.
• This gain will be observed mainly during intensive sessions doing massive data
changes.
• Watch out the output of onstat –g iof and look for low IO thruput per second.
13
Few columns or many columns in the same index?
Key points to consider
• Remember about « cold » reads and « hot » reads when
testing the efficiency of an index. Results can be
dramatically different between cold and hot.
• The choice is as often a hard to obtain trade-off, and
definately a long subject to discuss!
• Many columns in a index can make it more selective, but it
also will consume more CPU/disk resource when updating
keys (see b-tree cleaner tuning)
• Few columns in an index can make it less selective, but it
will consume less CPU/disk resource when updating keys
• Integrity constraints are not negotiable, but some integrity
constraints indexes can be negotiated…
14
Few columns or many columns?
Techniques to evaluate efficiency
• time dbaccess dbname queryfile gives an
indication on the efficiency of an index, but can be
misleading due to cold and hot measure huge
differences.
• onmode –Y sessnum 1 will identify which
index(es) are used, also will inform on how many rows
have been scanned against how many rows have been
returned
• onstat –g his (sqltrace) will give fine detail
about response time, buffer and disk access, lock waits
etc…
• A complete diagnostic will be done with the 3 tools.
15
Few columns or Many columns?
Let’s analyze a real case: one column
16
Rows scanned: 4913
Response time: 0.0368’’
1 column index
buffer reads: 5900
Few columns or many columns?
Same case, index with 2 columns
17
Rows scanned: 106
Response time: 0.0047’’
2 columns index
Buffer reads: 122
Highly duplicated lead columns
indexes: how was life before?
• The Antique Informix Rule stated to avoid multi-
columns indexes with low selectivity for the
leading keys, due to poor efficiency.
Ex: warehouse_id,district_id,order_id,order_line
• Querying on order_line required to specify the
lead columns in the query predicate, or create
another index with order_line as lead column
• Restructuring indexes following those rules was a
complex, long and risky task, not to mention the
fact that any downtime due to index rebuilding
was poorly accepted by Operations Managers…
18
Index key first & self join : it’s magic!
• The key-first scan was introduced in 7.3. It has been enhanced so
that an index can be used even the lead columns are not specified
in the where clause
• The index self join technique has been introduced in IDS 11.10,
although many DBA’s didn’t even notice it!
• By scanning subsets of the poorly selective composite index, the
engine manages to use the non-subsequent index keys as index
filters, transforming the index into a highly selective index.
• Hierarchical-like indexes with highly duplicated lead columns now
need no redefinition to be efficient.
• You need not building new indexes with highly selective lead
columns. This saves optimizer work and disk space.
• Index self join is enabled by default. You can, if you persist in not
using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or
with an optimizer directive {+AVOID_INDEX_SJ}
19
Index self-join: the test
• We will use the order_line TPC-C table, that contains
23,735,211 rows
• The index follows the hierarchy, which was formerly
considered as a poor implementation:
ol_w_id: warehouse id (50 distinct values)
ol_d_id: district id (10 distinct values)
ol_o_id: order number ( 9279 distinct values)
ol_number: order line number (14 distinct values)
• The challenging query is
SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount)
FROM order_line
GROUP BY 1,2
ORDER BY 2,3
20
No Self join
• Use onmode -wm INDEX_SELFJOIN=0 to disable self join
21
Index is taken, but only key first
Many rows scanned
Response time: 11.258’’
Self join: find the differences!
22
Key-first + self join access
Rows scanned: =~ 100 times less
RT: 3.313’’
The Antique Informix Rule says:
“you will use only one index per table”
The AIR says:
“you will use only one index per table”
• The Antique Informix Rule stated that only one
index per table could be used
• The optimizer had to choose only one index
among several indexes for the same table,
although several indexes were needed.
• Many not so unrealistic query cases had to be
drastically re-written in order to provide
acceptable response times
• The trick was generally to use an UNION or a
nested query, but the query code readability and
maintenability suffered from that.
24
What A.I.R. obliged you to do
• Generally, the best way to workaround the RT
issue was to use either UNION or nested queries
• The trick could be efficient in terms of Response
Time, but the code got more complex to read and
to maintain
• This workaround needed to strongly modify the
application code, and needed detailed and
accurate tests to obtain the same results as with
the initial query
25
The optimizer constantly getting
smarter across releases
• An optimizer enhancement introduced the use
of several indexes on the same table, but only
if the where clauses were linked with the ‘OR’
operator.
• The query path is like a usual INDEX PATH, the
difference being the use of several indexes
26
Measure with INDEX PATH
Use of 3 indexes!
Simple INDEX PATH
Scanned rows: 376,000
RT: 2.489’’
27
Disk reads:: 34136
Multi index: different path
33% gain in RT
Multi-index /skip scan enabled
Response Time is shorter
3 indexes used
Disk reads: 1984
28
Multiple indexes:
what should be done?
• Generally, the optimizer decides correctly which is the best path
• You can compare the results with the use of UNION, then decide
between keeping hard to maintain code or not
• You can nonetheless use optimizer directives to force the access
method, like
{+ AVOID_MULTI_INDEX (clients)}
To force INDEX PATH
• Or
{+ MULTI_INDEX (clients)}
TO force multi index SKIP SCAN path
• Can get tricky to make a self choice if AND and OR conditions are
set on the involved indexes
• The difference is almost not visible in case of hot measure
• Statistics on indexes are very important, the access method can
change according to them!
29
Star join
• Star join is an extension of the MULTI INDEX concept
• It combines this technique with DYNAMIC HASH JOINS
• The technique has been ported from XPS to IDS 11.70
• It is used exclusively for DS/OLAP queries where a FACT
table is the center point of many dimension tables
• Requires PDQPRIORITY ( Ultimate Edition or Enterprise
Edition )
• If you consider using Star Join, you are an excellent
candidate to see a demo of Informix Warehouse
Accelerator!
30
The A.I.R says:
« you will avoid indexes with too many tree levels »
• Ok, but what could I do to solve that ?
My indexes are built with the data they
have inside, and nothing or almost
nothing can be done
• Databases and tables are getting
bigger and bigger, and
splitting/archiving part of the data is
not always an acceptable solution
31
FOREST OF TREES INDEXES
• The forest of trees index type has been
introduced in 11.70 xC1
• It replicates the model of a traditionnal B-
TREE, having several root nodes instead of
only one root node
• The forest of trees brings benefits when
contention against the root node is observed
32
Reducing b-tree levels number
on index « lastname,firstname »
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree
=> The initial number of b-trees levels is 6
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 10 buckets
=> The number of b-trees levels decreased to 5
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 100 buckets
=> The number of b-trees levels decreased to 4
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 1000 buckets
=> The number of b-trees levels decreased to 3
33
Tpcc with regular b-tree indexes
• Index iu_stock_01 has 4 levels
Tpcc result is 14093 tpmC
High contention on
iu_stock_01: 8,704,052 spins
in 4 mn
34
Tpcc with FOT on iu_stock_01
• create unique index iu_stock_01 on stock (s_w_id,s_i_id)
using btree in data03 HASH on (s_w_id) with 50 buckets;
• Index iu_stock_01 has now 3 levels
Result grew to 16413 tpmC
Contention on iu_stock_01
decreased from 8,704,000
to 149,600 spins in 4mn
iu_oorder_01 is now a good
candidate for FOT!
35
Main facts on FOT indexes
• FOT is very efficient on reducing concurrency on indexes
access => Better RT in OLTP context
• FOT is very efficient to reduce levels of B-TREE => Better
overall RT
• Ideal for primary keys and foreign keys in an high
concurrency OLTP context
• Implementation is easy and fast
• Supports main index functionality: ER, PK, FK, b-tree
cleaning…
• Does not support aggregate queries, range scans on HASH
ON columns
• Also does not support index clustering, index fillfactor and
functional(UDR based) indexes
36
Optimizing big index creation:
PSORT_NPROCS
• The PSORT_NPROCS env variable is used to allocate more
threads to the sort package, which is also used for parallel
index creation.
• Significant performance improvements on index creation
can be obtained on multi-core/multi-processor servers
• It can be used even with non PDQPRIORITY-enabled
editions if the server has more than one core/CPU.
• PSORT_NPROCS can unleash the memory consumption:
please check for available memory on the server.
• The onconfig parameter DS_NONPDQ_QUERY_MEM has to
be checked if using PSORT_NPROCS.
37
Optimizing big index creation
DBSPACETEMP or PSORT_DBTEMP
• The env variables DBSPACETEMP overrides the
same onconfig parameter.
• Generally raw-device based temp dbspaces offer
more performance than file system based files.
• PSORT_DBTEMP write temporary sort files in the
specified file-system based directories instead of
DBSPACETEMP.
• It is useful to spread the temporary sort files to a
wider list of directories mounted on different
spindles
38
PSORT_NPROCS/PSORT_DBTEMP:
facts
• create index id_clients_02 on clients(lastname,firstname)
• unset PSORT_NPROCS
unset PSORT_DBTEMP
=> 13m28.709s
• export PSORT_NPROCS=3
export PSORT_DBTEMP=
/tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id
s_chunks/ids_space03
=> 6m19
• A ram disk, or even a SSD drive can improve performance a lot:
export PSORT_NPROCS=3
export PSORT_DBTEMP=/mnt/myramdisk
=> 4m22.030s
• To check the environment of the session:
onstat –g env SessionNumber
39
Index disable: What happens?
• Disabling an existing index will prevent the server from using this
index, but it will « remember » the index schema.
• This technique can be applied before executing massive data insert
or update, since it will alleviate the index keys update workload.
• Heavy side effects can be expected: loss of key unicity, loss of
performance…
• If you run a query on a disabled index, the optimizer will probably
choose a sequential scan unless a better path is found.
• The index will be seen as ‘disabled’ in dbschema, but will not be
seen in oncheck –pT no oncheck –pe
• Disabling an index will make its former disk space available in the
dbspace
• Disabling an index is immediate
• Syntax is: set indexes IndexName disabled
40
Index enable: what happens?
• Enabling an index will rebuild the index physically,
with the same definition as before
• Enabling an index takes as much time as creating
the same index
• But the enable statement is simpler to type than the
create index statement 
• + you do not have to remember the initial create
index statement 
• Syntax is: set indexes IndexName enabled
41
Digging for more performance:
Disable foreign key indexes
• Many times, foreign key indexes are a part of the same table’s primary
key.
• order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number)
order_line foreign key (ol_w_id,ol_d_id,ol_o_id)
• Using ‘disable index’ in the add constraint statement will save the
creation of an ‘unuseful’ index, because its structure is already existing
in the primary key.
• ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id)
REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED);
• This implementation will save disk space by dropping an index
• CPU resource will be saved when updating/deleting/creating index keys,
• and consequently disk IO will also be saved.
• Check that disabling the constraint index has no hidden side effects, an
mistake can have expensive consequences!
42
I need to create a new index,
but users are always connected to the table!
• Sometimes a new index needs to be created, but
the tables are accessed by users or batches.
• IDS 11.10 introduced the possibility to create an
index without putting an exclusive lock on the table,
called index online.
• Users can SELECT, INSERT, UPDATE or DELETE rows
in the table while the index is being created
• Syntax is:
create index id_clients_01 on clients(lastname,firstname)ONLINE
• Drop index online is also available in the same
conditions
43
Create index online:
precautions & restrictions
• The create index online is a complex operation, involving
table snapshot, base index build catch up and more.
• It will request additional resources, such as disk space, CPU
and memory in order to make the operation safe and as
fast as possible.
• Long transactions may happen: check logical logs size
before diving
• The index pre-image pool memory size is managed with the
onconfig parameter ONLIDX_MAXMEM, updatable with
onmode –wm
• No appliable for cluster index, UDT columns, no UDR
indexes
• Only one create index online per table at the same time
44
Index compression
• IDS introduced table compression in 11.50 xC4. This technology is now
used successfully in large databases implementations.
• Index compression is a new feature of IDS 12.10. It is based on the
same technology as table compression.
• The principle is to compress the key columns values at b-tree leaf level,
but not the rowids attached to these key values
• Index compression is very effective for indexes having large key values:
names, item names etc…
• The compression dictionary must contain at least 2000 unique key
values
• Index compression is an excellent way to save disk space, and …
• Since more key values fit in an index page, more key values can be read
in one IO cycle => IO is more efficient
• Reducing IO must enhance index access performance in large queries
45
Index compression:
Disk space gained
• Execute function task ("index compress", "id_clients_01", "staging");
• Or
execute function task(“index compress”, “j”,“testdb”);
• Or
create index id_clients_01 on clients(lastname,firstname) compressed
More than 50% compression rate
46
Cluster index
• The creation or alter of a cluster index will physically sort
the table data by the first column of this index at creation
time
• Accessing a table data with a cluster index will read already
sorted data pages.
• Generally makes IO on data pages easier because they are
contiguous => Decrease RT
• The cluster level will decrease as long as new rows are
insert
• High cost of administration: re-clustering this index will
rewrite the table data pages
• Cluster index can be good for stable tables accessed in a
ordered sequential way
47
Statistics on indexes
• Introduced in 11.70: when one creates an index,
the distributions for this index are automatically
created
• High mode statistics are generated for the lead
column
• Index levels statistics are also generated in low
mode
• This will not stop you from regularly updating
statistics for those indexes, but it is no more
required to do it just after the index creation
Questions?
Indexing techniques: which one to use when
Eric Vercelletto Begooden IT Consulting eric.vercelletto@begooden-it.com
Ad

More Related Content

Viewers also liked (8)

Covering Indexes Ordersof Magnitude Improvements
Covering  Indexes  Ordersof Magnitude  ImprovementsCovering  Indexes  Ordersof Magnitude  Improvements
Covering Indexes Ordersof Magnitude Improvements
PerconaPerformance
 
Explain that explain
Explain that explainExplain that explain
Explain that explain
Fabrizio Parrella
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash Join
Ajay Gupte
 
Optimizer Enhancement in Informix
Optimizer Enhancement in InformixOptimizer Enhancement in Informix
Optimizer Enhancement in Informix
Bingjie Miao
 
IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7
Pradeep Natarajan
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best Practices
Isaac Mosquera
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
Jeremy Coates
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
Karwin Software Solutions LLC
 
Covering Indexes Ordersof Magnitude Improvements
Covering  Indexes  Ordersof Magnitude  ImprovementsCovering  Indexes  Ordersof Magnitude  Improvements
Covering Indexes Ordersof Magnitude Improvements
PerconaPerformance
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash Join
Ajay Gupte
 
Optimizer Enhancement in Informix
Optimizer Enhancement in InformixOptimizer Enhancement in Informix
Optimizer Enhancement in Informix
Bingjie Miao
 
IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7
Pradeep Natarajan
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best Practices
Isaac Mosquera
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
Jeremy Coates
 

Similar to A12 vercelletto indexing_techniques (20)

Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
JasonTuran2
 
SQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowSQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should Know
Dean Richards
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
Syed Zaid Irshad
 
Scaling Security Workflows in Government Agencies
Scaling Security Workflows in Government AgenciesScaling Security Workflows in Government Agencies
Scaling Security Workflows in Government Agencies
Avere Systems
 
Breaking data
Breaking dataBreaking data
Breaking data
Terry Bunio
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
Vadim Tkachenko
 
Scaling tappsi
Scaling tappsiScaling tappsi
Scaling tappsi
Óscar Andrés López
 
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld
 
InfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux vs_RDBMS
InfiniFlux vs_RDBMS
InfiniFlux
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
PlusOrMinusZero
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
chap2_slidesforparallelcomputingananthgarama
chap2_slidesforparallelcomputingananthgaramachap2_slidesforparallelcomputingananthgarama
chap2_slidesforparallelcomputingananthgarama
doomzday27
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql server
Chris Adkin
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
Aneesh Raveendran
 
This is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM studentsThis is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM students
cegafen778
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
Edhole.com
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
Amazon Web Services LATAM
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
Edhole.com
 
Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
JasonTuran2
 
SQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowSQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should Know
Dean Richards
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
Syed Zaid Irshad
 
Scaling Security Workflows in Government Agencies
Scaling Security Workflows in Government AgenciesScaling Security Workflows in Government Agencies
Scaling Security Workflows in Government Agencies
Avere Systems
 
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld
 
InfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux vs_RDBMS
InfiniFlux vs_RDBMS
InfiniFlux
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
PlusOrMinusZero
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
chap2_slidesforparallelcomputingananthgarama
chap2_slidesforparallelcomputingananthgaramachap2_slidesforparallelcomputingananthgarama
chap2_slidesforparallelcomputingananthgarama
doomzday27
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql server
Chris Adkin
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
Aneesh Raveendran
 
This is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM studentsThis is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM students
cegafen778
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
Edhole.com
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
Edhole.com
 
Ad

More from BeGooden-IT Consulting (8)

Querix lycia presentation v1.2 fr
Querix lycia presentation v1.2 frQuerix lycia presentation v1.2 fr
Querix lycia presentation v1.2 fr
BeGooden-IT Consulting
 
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl applicationQuerix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
BeGooden-IT Consulting
 
Querix Lycia: 4GL is modern!
Querix Lycia: 4GL is modern!Querix Lycia: 4GL is modern!
Querix Lycia: 4GL is modern!
BeGooden-IT Consulting
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
BeGooden-IT Consulting
 
IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...
BeGooden-IT Consulting
 
Informix4gl status
Informix4gl statusInformix4gl status
Informix4gl status
BeGooden-IT Consulting
 
Ibm informix security functionality overview
Ibm informix security functionality overviewIbm informix security functionality overview
Ibm informix security functionality overview
BeGooden-IT Consulting
 
F12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmarkF12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmark
BeGooden-IT Consulting
 
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl applicationQuerix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
BeGooden-IT Consulting
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
BeGooden-IT Consulting
 
IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...
BeGooden-IT Consulting
 
Ibm informix security functionality overview
Ibm informix security functionality overviewIbm informix security functionality overview
Ibm informix security functionality overview
BeGooden-IT Consulting
 
F12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmarkF12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmark
BeGooden-IT Consulting
 
Ad

Recently uploaded (20)

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 

A12 vercelletto indexing_techniques

  • 1. ::IBM Informix indexing techniques: which one to use when ? Eric Vercelletto Session A12 Begooden IT Consulting 4/23/2013 3:35 PM
  • 2. • Introduction to Response Time measuring • Identify the relevant indexing techniques • Describe implementation method • Confirm/recognize its use by accurate monitoring • Measure its efficency as response time and effective use in the database (sqltrace,sqexplain) • Identify pros and cons Agenda / methodology 4/24/2013 Session F12 2
  • 3. Introduction • Begooden IT Consulting is an IBM ISV company, mainly focused on Informix technology services. • Our 15+ years experience within Informix Software France and Portugal helped us to acquire in depth product knowledge as well as solid field experience. • Our services include Informix implementation auditing, performance tuning, issue management, administration mentoring … • We also happen to be the Querix reseller for France and French speaking countries (except Québec and Louisiana) • The company is based in Pont l’Abbé, Finistère, France 4/24/2013 3
  • 4. Some basics not to forget about There are 2 ways to measure response times • The « cold » measure: the response time is measured just after starting the engine, when data and index pages are not yet loaded into Shared Memory IFMX buffers. Disk IO must be performed to read the data and index pages, which will increase the RT. • The « hot » measure: RT is measured when data and index pages are loaded into SHMEM. No or few disk IO => RT is much shorter. • This point can often explain surprising RT differences according to how the data accessed. • Broad range or DS queries most often access data and/or indexes in disk pages • OLTP queries mostly access data and indexes in SHMEM pages 4
  • 5. Derivated thoughts and facts • Reading data pages and/or index pages on disk always take longer than in SHMEM. Full table scans can take minutes or more, according to table size • Reading data pages in SHMEM is very fast. Full scan of a table in SHMEM take fractions of seconds or seconds, rarely more. • Reading index pages in SHMEM is also very fast. Added to this, due to the B TREE structure, reading index pages generally handles more contents than reading data pages. • This often makes difficult the comparison of the efficiency of 2 different indexes on the same table, when reading in SHMEM. 5
  • 6. Derivated thoughts and facts (continued) • When running hot measures on indexes, the differences can be as low as milliseconds BUT … • Repeating millions of times 3 unuseful milliseconds can make a difference! • When the Response Times get to such a low level, sqltrace is the tool you need to understand the query behaviour. • In certain situations, saving milliseconds on a query will make the difference. In other situations, saving seconds will not make the difference. • A bad response time can be caused by an unappropriate indexation, but can also be caused by some « unusual » logic adding unuseful efforts to be performed by the applications and the server. 6
  • 7. Comparing cold measure with hot measure (1) • full scan of a mid-sized table tpcc:order_line, containing 24 millions of rows se l e ct * from order_line on s t at -g his output « Cold » read: performed just after oninit -v « Hot read: performed after the first scan Many disk pages read zero disk pages read47.4 secs 19,4 secs secs All buffer reads 7
  • 8. Comparing cold measure with hot measure (2) • Cold use of a poor selectivity index select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values) Cold read Hot read Few disk readsMany disk reads Execution time: 5,9 secs Execution time: 1.1 secs 8
  • 9. BATCHEDREAD_INDEX: description • This feature has been taken from XPS and introduced in 11.50xC5. • The purpose is to maximize the index keys access by grouping the reading of many index keys into large buffers, then fetching the rows associated with those keys • This technique brings strong savings in terms of CPU and IO, therefore reducing Response Time. • This technique is suitable and efficient for massive index reads (DS/OLAP), not for pinpoint- type (OLTP) index access. 9
  • 10. BATCHEDREAD_INDEX: the test • We will run the following query against a 30 millions rows clients table. The table has an index on ‘lastname’. Row size is 328 bytes output to /dev/null select lastname,count(*) from clients group by 1 • This query returns 2,188,286 rows 10
  • 11. BATCHEDREAD_INDEX: facts • All those response times are measured as « cold » AUTO_READAHEAD 0 BATCHEDREAD_INDEX 0 • AUTO_READAHEAD 0 BATCHEDREAD_INDEX 1 • AUTO_READAHEAD 1 BATCHEDREAD_INDEX 1 See the difference 11
  • 12. BATCHEDREAD_INDEX: how ? • BATCHEDREAD_INDEX can be set, as well as BATCHEDREAD_TABLE, either in the onconfig file • Or used as an environment variable before launching the application export IFX_BATCHEDREAD_INDEX=1 • Or as an SQL statement SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1'; • Monitor index scan activity with onstat –g scn • 12
  • 13. Attached or Detached Index? • The « Antique Informix Disk Layout » used to create the index pages in the same extents as the data pages for the attached indexes. The expected result was reducing disk IO. • This layout happened to become a problem because the data pages were often located far from the index pages, causing the opposite effect of increasing disk IO. The official recommandation was at this time to create detached indexes for this reason. • Nowadays, index pages are created in a different partition than the data pages, causing the attached indexes to have the same level of performance as the detached indexes. • But.. If you have the possibility to create the data dbspaces and the index dbspaces on independant disks and channels , you will increase your disk IO performance by reducing disk contention. • This gain will be observed mainly during intensive sessions doing massive data changes. • Watch out the output of onstat –g iof and look for low IO thruput per second. 13
  • 14. Few columns or many columns in the same index? Key points to consider • Remember about « cold » reads and « hot » reads when testing the efficiency of an index. Results can be dramatically different between cold and hot. • The choice is as often a hard to obtain trade-off, and definately a long subject to discuss! • Many columns in a index can make it more selective, but it also will consume more CPU/disk resource when updating keys (see b-tree cleaner tuning) • Few columns in an index can make it less selective, but it will consume less CPU/disk resource when updating keys • Integrity constraints are not negotiable, but some integrity constraints indexes can be negotiated… 14
  • 15. Few columns or many columns? Techniques to evaluate efficiency • time dbaccess dbname queryfile gives an indication on the efficiency of an index, but can be misleading due to cold and hot measure huge differences. • onmode –Y sessnum 1 will identify which index(es) are used, also will inform on how many rows have been scanned against how many rows have been returned • onstat –g his (sqltrace) will give fine detail about response time, buffer and disk access, lock waits etc… • A complete diagnostic will be done with the 3 tools. 15
  • 16. Few columns or Many columns? Let’s analyze a real case: one column 16 Rows scanned: 4913 Response time: 0.0368’’ 1 column index buffer reads: 5900
  • 17. Few columns or many columns? Same case, index with 2 columns 17 Rows scanned: 106 Response time: 0.0047’’ 2 columns index Buffer reads: 122
  • 18. Highly duplicated lead columns indexes: how was life before? • The Antique Informix Rule stated to avoid multi- columns indexes with low selectivity for the leading keys, due to poor efficiency. Ex: warehouse_id,district_id,order_id,order_line • Querying on order_line required to specify the lead columns in the query predicate, or create another index with order_line as lead column • Restructuring indexes following those rules was a complex, long and risky task, not to mention the fact that any downtime due to index rebuilding was poorly accepted by Operations Managers… 18
  • 19. Index key first & self join : it’s magic! • The key-first scan was introduced in 7.3. It has been enhanced so that an index can be used even the lead columns are not specified in the where clause • The index self join technique has been introduced in IDS 11.10, although many DBA’s didn’t even notice it! • By scanning subsets of the poorly selective composite index, the engine manages to use the non-subsequent index keys as index filters, transforming the index into a highly selective index. • Hierarchical-like indexes with highly duplicated lead columns now need no redefinition to be efficient. • You need not building new indexes with highly selective lead columns. This saves optimizer work and disk space. • Index self join is enabled by default. You can, if you persist in not using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or with an optimizer directive {+AVOID_INDEX_SJ} 19
  • 20. Index self-join: the test • We will use the order_line TPC-C table, that contains 23,735,211 rows • The index follows the hierarchy, which was formerly considered as a poor implementation: ol_w_id: warehouse id (50 distinct values) ol_d_id: district id (10 distinct values) ol_o_id: order number ( 9279 distinct values) ol_number: order line number (14 distinct values) • The challenging query is SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount) FROM order_line GROUP BY 1,2 ORDER BY 2,3 20
  • 21. No Self join • Use onmode -wm INDEX_SELFJOIN=0 to disable self join 21 Index is taken, but only key first Many rows scanned Response time: 11.258’’
  • 22. Self join: find the differences! 22 Key-first + self join access Rows scanned: =~ 100 times less RT: 3.313’’
  • 23. The Antique Informix Rule says: “you will use only one index per table”
  • 24. The AIR says: “you will use only one index per table” • The Antique Informix Rule stated that only one index per table could be used • The optimizer had to choose only one index among several indexes for the same table, although several indexes were needed. • Many not so unrealistic query cases had to be drastically re-written in order to provide acceptable response times • The trick was generally to use an UNION or a nested query, but the query code readability and maintenability suffered from that. 24
  • 25. What A.I.R. obliged you to do • Generally, the best way to workaround the RT issue was to use either UNION or nested queries • The trick could be efficient in terms of Response Time, but the code got more complex to read and to maintain • This workaround needed to strongly modify the application code, and needed detailed and accurate tests to obtain the same results as with the initial query 25
  • 26. The optimizer constantly getting smarter across releases • An optimizer enhancement introduced the use of several indexes on the same table, but only if the where clauses were linked with the ‘OR’ operator. • The query path is like a usual INDEX PATH, the difference being the use of several indexes 26
  • 27. Measure with INDEX PATH Use of 3 indexes! Simple INDEX PATH Scanned rows: 376,000 RT: 2.489’’ 27 Disk reads:: 34136
  • 28. Multi index: different path 33% gain in RT Multi-index /skip scan enabled Response Time is shorter 3 indexes used Disk reads: 1984 28
  • 29. Multiple indexes: what should be done? • Generally, the optimizer decides correctly which is the best path • You can compare the results with the use of UNION, then decide between keeping hard to maintain code or not • You can nonetheless use optimizer directives to force the access method, like {+ AVOID_MULTI_INDEX (clients)} To force INDEX PATH • Or {+ MULTI_INDEX (clients)} TO force multi index SKIP SCAN path • Can get tricky to make a self choice if AND and OR conditions are set on the involved indexes • The difference is almost not visible in case of hot measure • Statistics on indexes are very important, the access method can change according to them! 29
  • 30. Star join • Star join is an extension of the MULTI INDEX concept • It combines this technique with DYNAMIC HASH JOINS • The technique has been ported from XPS to IDS 11.70 • It is used exclusively for DS/OLAP queries where a FACT table is the center point of many dimension tables • Requires PDQPRIORITY ( Ultimate Edition or Enterprise Edition ) • If you consider using Star Join, you are an excellent candidate to see a demo of Informix Warehouse Accelerator! 30
  • 31. The A.I.R says: « you will avoid indexes with too many tree levels » • Ok, but what could I do to solve that ? My indexes are built with the data they have inside, and nothing or almost nothing can be done • Databases and tables are getting bigger and bigger, and splitting/archiving part of the data is not always an acceptable solution 31
  • 32. FOREST OF TREES INDEXES • The forest of trees index type has been introduced in 11.70 xC1 • It replicates the model of a traditionnal B- TREE, having several root nodes instead of only one root node • The forest of trees brings benefits when contention against the root node is observed 32
  • 33. Reducing b-tree levels number on index « lastname,firstname » • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree => The initial number of b-trees levels is 6 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 10 buckets => The number of b-trees levels decreased to 5 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 100 buckets => The number of b-trees levels decreased to 4 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 1000 buckets => The number of b-trees levels decreased to 3 33
  • 34. Tpcc with regular b-tree indexes • Index iu_stock_01 has 4 levels Tpcc result is 14093 tpmC High contention on iu_stock_01: 8,704,052 spins in 4 mn 34
  • 35. Tpcc with FOT on iu_stock_01 • create unique index iu_stock_01 on stock (s_w_id,s_i_id) using btree in data03 HASH on (s_w_id) with 50 buckets; • Index iu_stock_01 has now 3 levels Result grew to 16413 tpmC Contention on iu_stock_01 decreased from 8,704,000 to 149,600 spins in 4mn iu_oorder_01 is now a good candidate for FOT! 35
  • 36. Main facts on FOT indexes • FOT is very efficient on reducing concurrency on indexes access => Better RT in OLTP context • FOT is very efficient to reduce levels of B-TREE => Better overall RT • Ideal for primary keys and foreign keys in an high concurrency OLTP context • Implementation is easy and fast • Supports main index functionality: ER, PK, FK, b-tree cleaning… • Does not support aggregate queries, range scans on HASH ON columns • Also does not support index clustering, index fillfactor and functional(UDR based) indexes 36
  • 37. Optimizing big index creation: PSORT_NPROCS • The PSORT_NPROCS env variable is used to allocate more threads to the sort package, which is also used for parallel index creation. • Significant performance improvements on index creation can be obtained on multi-core/multi-processor servers • It can be used even with non PDQPRIORITY-enabled editions if the server has more than one core/CPU. • PSORT_NPROCS can unleash the memory consumption: please check for available memory on the server. • The onconfig parameter DS_NONPDQ_QUERY_MEM has to be checked if using PSORT_NPROCS. 37
  • 38. Optimizing big index creation DBSPACETEMP or PSORT_DBTEMP • The env variables DBSPACETEMP overrides the same onconfig parameter. • Generally raw-device based temp dbspaces offer more performance than file system based files. • PSORT_DBTEMP write temporary sort files in the specified file-system based directories instead of DBSPACETEMP. • It is useful to spread the temporary sort files to a wider list of directories mounted on different spindles 38
  • 39. PSORT_NPROCS/PSORT_DBTEMP: facts • create index id_clients_02 on clients(lastname,firstname) • unset PSORT_NPROCS unset PSORT_DBTEMP => 13m28.709s • export PSORT_NPROCS=3 export PSORT_DBTEMP= /tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id s_chunks/ids_space03 => 6m19 • A ram disk, or even a SSD drive can improve performance a lot: export PSORT_NPROCS=3 export PSORT_DBTEMP=/mnt/myramdisk => 4m22.030s • To check the environment of the session: onstat –g env SessionNumber 39
  • 40. Index disable: What happens? • Disabling an existing index will prevent the server from using this index, but it will « remember » the index schema. • This technique can be applied before executing massive data insert or update, since it will alleviate the index keys update workload. • Heavy side effects can be expected: loss of key unicity, loss of performance… • If you run a query on a disabled index, the optimizer will probably choose a sequential scan unless a better path is found. • The index will be seen as ‘disabled’ in dbschema, but will not be seen in oncheck –pT no oncheck –pe • Disabling an index will make its former disk space available in the dbspace • Disabling an index is immediate • Syntax is: set indexes IndexName disabled 40
  • 41. Index enable: what happens? • Enabling an index will rebuild the index physically, with the same definition as before • Enabling an index takes as much time as creating the same index • But the enable statement is simpler to type than the create index statement  • + you do not have to remember the initial create index statement  • Syntax is: set indexes IndexName enabled 41
  • 42. Digging for more performance: Disable foreign key indexes • Many times, foreign key indexes are a part of the same table’s primary key. • order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number) order_line foreign key (ol_w_id,ol_d_id,ol_o_id) • Using ‘disable index’ in the add constraint statement will save the creation of an ‘unuseful’ index, because its structure is already existing in the primary key. • ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id) REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED); • This implementation will save disk space by dropping an index • CPU resource will be saved when updating/deleting/creating index keys, • and consequently disk IO will also be saved. • Check that disabling the constraint index has no hidden side effects, an mistake can have expensive consequences! 42
  • 43. I need to create a new index, but users are always connected to the table! • Sometimes a new index needs to be created, but the tables are accessed by users or batches. • IDS 11.10 introduced the possibility to create an index without putting an exclusive lock on the table, called index online. • Users can SELECT, INSERT, UPDATE or DELETE rows in the table while the index is being created • Syntax is: create index id_clients_01 on clients(lastname,firstname)ONLINE • Drop index online is also available in the same conditions 43
  • 44. Create index online: precautions & restrictions • The create index online is a complex operation, involving table snapshot, base index build catch up and more. • It will request additional resources, such as disk space, CPU and memory in order to make the operation safe and as fast as possible. • Long transactions may happen: check logical logs size before diving • The index pre-image pool memory size is managed with the onconfig parameter ONLIDX_MAXMEM, updatable with onmode –wm • No appliable for cluster index, UDT columns, no UDR indexes • Only one create index online per table at the same time 44
  • 45. Index compression • IDS introduced table compression in 11.50 xC4. This technology is now used successfully in large databases implementations. • Index compression is a new feature of IDS 12.10. It is based on the same technology as table compression. • The principle is to compress the key columns values at b-tree leaf level, but not the rowids attached to these key values • Index compression is very effective for indexes having large key values: names, item names etc… • The compression dictionary must contain at least 2000 unique key values • Index compression is an excellent way to save disk space, and … • Since more key values fit in an index page, more key values can be read in one IO cycle => IO is more efficient • Reducing IO must enhance index access performance in large queries 45
  • 46. Index compression: Disk space gained • Execute function task ("index compress", "id_clients_01", "staging"); • Or execute function task(“index compress”, “j”,“testdb”); • Or create index id_clients_01 on clients(lastname,firstname) compressed More than 50% compression rate 46
  • 47. Cluster index • The creation or alter of a cluster index will physically sort the table data by the first column of this index at creation time • Accessing a table data with a cluster index will read already sorted data pages. • Generally makes IO on data pages easier because they are contiguous => Decrease RT • The cluster level will decrease as long as new rows are insert • High cost of administration: re-clustering this index will rewrite the table data pages • Cluster index can be good for stable tables accessed in a ordered sequential way 47
  • 48. Statistics on indexes • Introduced in 11.70: when one creates an index, the distributions for this index are automatically created • High mode statistics are generated for the lead column • Index levels statistics are also generated in low mode • This will not stop you from regularly updating statistics for those indexes, but it is no more required to do it just after the index creation
  • 49. Questions? Indexing techniques: which one to use when Eric Vercelletto Begooden IT Consulting [email protected]