0% found this document useful (0 votes)
183 views

Mastering GeoServer - DBMS Connection Parameters Explained

Geoserver

Uploaded by

Naveen Kumar S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views

Mastering GeoServer - DBMS Connection Parameters Explained

Geoserver

Uploaded by

Naveen Kumar S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Mastering GeoServer 

DBMS Connection 
Parameters Explained 

Ing. Simone Giannecchini


Ing. Andrea Aime
GeoSolutions SAS
Version 1.2
22/01/2014
1

Table of Contents 
GeoServer DataStores and DBMS Connections
General considerations
Internal Connection Pool
Prepared statement notes
A few more thoughts
JNDI Connection Pool
Configuring pools for productions
Connection waiting time and relation with other params
Maximizing sharing of pools
Validation queries

   

© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
2

GeoServer DataStores and DBMS Connections 


General considerations 
Database connections are valuable resources and as such shall be managed with care1:
● they are heavy to create and maintain for the database server itself since they are
usually child processes of the DBMS server process
● being processes that means that creating a connection is not a zero-cost process
therefore we should avoid creating connections as we need to connect to a DB but we
should tend to create them in advance in order to minimize the impact of the time
needed to create them on the time needed to serve a request.
● as a consequence of the fact that a connection require non negligible resources on the
server DBMS DBAs tend to
a. limit the number of connections globally available (e.g. Postgres by default has a
limit set to 100)
b. limit the lifetime of connections created in order to discourage clients from
retaining connections for a really long time

The purpose served by a connection pool is to maintain connections to an underlying database


between requests. The benefit is that connection setup only need to occur once on the first
request while subsequent requests use existing connections and achieve a performance benefit
as a result.

Ok, now let’s go into GeoServer specifics. In most GeoServer DataStores you have the
possibility to use the JNDI2 or the standard store which basically means you can have
GeoServer manage the connection pool for you or you can configure it externally (from within
the Application Server of choice) and then have GeoServer lean onto it to get connections.
Baseline is, one way or the other you’ll always end-up using a connection pool in GeoServer.

Internal Connection Pool 


Let’s start with the internal connection pool which is there by default on the Postgis, Oracle and
SQL Server DataStore. GeoServer relies on Apache Commons DBCP3 for its internal pooling
however let me say this upfront, the number of pool configuration parameters that we expose is
a subset of the possible ones, although the most interesting are there. Namely there are a few
that you might want to customize.

1
​These recommendations apply not only to GeoServer but are of general usage when it comes
to working with a DBMS.
2
http://en.wikipedia.org/wiki/Java_Naming_and_Directory_Interface
3
http://commons.apache.org/proper/commons-dbcp/
© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
3

max The maximum number of connections the pool can hold. When the maximum
connections number of connections is exceeded, additional requests that require a
database connection will be halted until a connection from the pool becomes
available and eventually times out if that’s not possible within the time
specified in the ​connection timeout​. ​The maximum number of connections
limits the number of concurrent requests that can be made against the
database.

min The minimum number of connections the pool will hold. This number of
connections connections is held even when there are no active requests. When this
number of connections is exceeded due to serving incoming requests
additional connections are opened until the pool reaches its maximum size
(described above).

The implications of this number are multiple:

1. If it is very far from the max connections this might limit the ability of
the GeoServer to respond quickly to unexpected or random heavy
load situations due to the fact that it takes a non negligible time to
create a new connections. However this set up is very good when the
DBMS is quite loaded since it tends to use as less connections as
possible at all times.
2. If it is very close to the max connections value the GeoServer will be
very fast to respond to random load situation. However in this case
the GeoServer would put a big burden on DBMS shoulders as the the
poll will try to hold all needed connections at all times.

validate Flag indicating whether connections from the pool should be validated before
connections they are used. A connection in the pool can become invalid for a number of
reasons including network breakdown, database server timeout, etc.. The
benefit of setting this flag is that an invalid connection will never be used
which can prevent client errors. The downside of setting the flag is that a small
performance penalty is paid in order to validate connections when the
connection is borrowed from the pool since the validation is done by sending
small query to the server. However the cost of this query is usually small, as
an instance on PostGis the validation query is “Select 1”.

fetch size The number of records read from the database in each network exchange. If
set too low (<50) network latency will affect negatively performance, if set too
high it might consume a significant portion of GeoServer memory and push it
towards an O​ ut Of Memory error. Defaults to 1000, it might be beneficial to
push it to a higher number if the typical database query reads much more data
than this, and there is enough heap memory to hold the results

connection Time, in seconds, the connection pool will wait before giving up its attempt to
timeout get a new connection from the database. Defaults to 20 seconds.

© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
4

This timeout kicks in during heavy load conditions when the number of
requests needing a connection to a certain DB outnumber greatly the number
of available connections in the pool, therefore some requests might get error
messages due to the timeouts while acquiring a connection. This condition is
not per se problematic since usually a request does not use a DB connection
for its entire lifecycle hence we do not need 100 connections at hand to
respond to 100 requests4; however we should strive to limit this condition
since it would queue threads on the connection pool ​after ​they might have
allocated memory (e.g. for rendering). We will get back to this later on.

prepared Activates the usage of prepared statements (see the prepared statements
statements specific section below)
max open Maximum number of prepared statements kept open and cached for each
prepared connection in the pool.

statements

Prepared statement notes 


Prepared statements are used by databases to avoid re-planning the data access every time,
the plan is done only once up-front, and as long as the statement is cached, the plan does not
need to be re-computed.

In business applications fetching a small amount of data at a time this is beneficial for
performance, however, in spatial ones, where we typically fetch thousands of rows, the benefit
is limited, and sometimes, turns into a performance problem.
This is the case with PostGIS, that is able to tune the access plan by inspecting the requested
BBOX, and deciding if a sequential scan is preferable (the BBOX really accesses most of the
data) or using the spatial index is best instead. So, as a rule of thumb, when working with
PostGis, it’s better not to enable prepared statements.

With other databases there are no choices, Oracle currently works only with prepared
statements, SQL server only without them (this is often related to implementation limitations
than database specific issues).

4
For WMS requests the connection is held long enough to fetch and paint data for one particular layer,
then the connection is released to the pool and might be fetched back if the next layer is against the same
database, but for sure during the final PNG/JPEG encoding no connection is held. WFS GetFeature
instead keeps the connection open for as much time needed to write out the results (as GML is written as
we fetch data from the database)
© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
5

There is an upside of using prepared statement though: no sql injection attacks are possible
when using them. GeoServer code tries hard to avoid this kind of attack when working without
prepared statements, but enabling them makes the attack via filter parameters basically
impossible.

A few more thoughts 


Summarising, when creating standard DataStores for serving vector data from DBMS in
GeoServer you need to remember that internally a connection pool will be created.
This approach is the simplest to implement but might lead to an inefficient distribution of the
connections between different stores in the following cases:
● if we tend to separate tables into different schemas this will lead to the need for creating
multiple stores to serve them out since GeoServer works best if the “schema” parameter
is specified, this leading to the creation of (mostly unnecessary) connection pools
● if we want to create stores in different workspaces connecting to the same database this
again will lead to unnecessary duplication of connection pools in different store leading
to inefficient usage of connections
Long story short the fact that the pool is internal with respect to the stores may lead to inefficient
usage of connections to the underlying DBMS since they will be splitted between multiple stores
limiting the scalability of each of them: infact having 100 connections shared between N normal
DataStore will impose limits to the maximum number that each can use, otherwise if we
managed to keep the connections into a single pool shared, in turn, with the various DataStore
we would achieve a much more efficient sharing between the store as, as an instance, a single
store under high load could scale to use all the connections available.

JNDI Connection Pool 


An alternative way of configuring connection pools that can come into rescue when trying to
increase the potential for sharing between different DataStore is by specifying connection pools
inside the Application Server and then sharing them with different DataStores so that all of them
will be taking connections from a single pool.
An example can be shown here below that creates a valid connection pool in Tomcat for an
Oracle Database5:
<Resource name="jdbc/oralocal"
auth="Container"
type="javax.sql.DataSource"
driverClassName="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@localhost:1521:xe"
username="dbuser" password="dbpasswd"
maxActive="20"
initialSize="0"

5
More information cab ne found at this link
http://docs.geoserver.org/stable/en/user/tutorials/tomcat-jndi/tomcat-jndi.html
© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
6

minIdle="0"
maxIdle="3"
maxWait="10000"
timeBetweenEvictionRunsMillis="30000"
minEvictableIdleTimeMillis="120000"
testWhileIdle="true"
poolPreparedStatements="true"
maxOpenPreparedStatements="100"
validationQuery="SELECT SYSDATE FROM DUAL"
/>

For more recent Oracle installations (from Oracle 9i onwards) the “oracle.jdbc.OracleDriver”
should be used instead of “oracle.jdbc.driver.OracleDriver” since oracle.jdbc.driver.OracleDriver
is deprecated​.

Since Tomcat version 7 the following parameters can be set:

maxAge="600000”
rollbackOnReturn="true"

The ​maxAge​ parameter specifies the maximum number of milliseconds a connection will be kept
in the pool before is discarded. When a connection is returned to the pool the age of the
connection will be checked against this parameter, if maxAge has been reached the connection
will be closed instead of returning to the pool. Default value is 0 meaning the connections will be
left open and no age checks will be done.

If ​rollbackOnReturn​ is set to “true” the pool can terminate the transaction by calling rollback
on the connection as is returned to the pool. Defaults to false.

Using values that are too low for ​minEvictableIdleTimeMillis​ is not advisable and ​may result in
connection failures.

The Application Server will be responsible for managing the lifecycle of the connection pool (as
such the DBMS driver will be provided to the Application Server) which will be available via the
standard JNDI mechanism to GeoServer DataStores: with this mechanism the stores will not
handle an internal connection pool but they will be allowed to share connections from a large
unique pool.
A nice side effect of using JNDI is also that at a cost of higher complexity the administrator will
be granted much finer grain control over the configuration of the pool. As an instance it will be
possible to specify more sophisticated strategies for the number of connections in the pool. At
the same time it is possible change the validation strategy for the connection, as an instance it
possible to:

© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
7

1. test the connections while they sit idle in the pool rather than when acquiring them (this
would speed acquisition of a connection)
2. test the connection only when we give them back to the pool

One last thing to point out. A big pro of using JNDI is that it makes it possible to share the same
pools between stores in different workspaces thus allowing us to serve the same layers in
different workspace but sharing efficiently the same connection pool across them.

Configuring pools for productions 


Connection waiting time and relation with other params 
In general it is important to set the connection waiting time in a way that the connection pool
does not become a place where to queue threads executing requests under big load. It is
indeed possible that under big load threads executing requests for a vector layer will outnumber
the available connections in the pool hence such threads will be blocked trying to acquire a new
connection; if the number of connections is much smaller than the number of incoming requests
and the max wait time is quite big (e.g. 60 seconds) we will find ourselves in the condition to
have many threads waiting for a long time to acquire a connection ​after most of the resources
they need will be allocated, especially the memory back buffer if these are WMS requests.

The max wait time in general shall be set accordingly to the expected maximum execution time
for a requests, end-to-end. This include things like, accessing the file system, loading the data.
As an instance, if we take into account WMS requests we are allowed to specify a ​maximum
response time,​ therefore if set this to N seconds the ​max wait time should be set to a value
smaller than that since we don’t want to waste resources having threads blocked unnecessarily
waiting for a connection. In this case it shall be preferable to fail fast to release resources that
might be used unnecessarily otherwise.

Maximizing sharing of pools 


How the data is organized between database schemas and table impact the degree of flexibility
we have when trying to best share connections, regardless of the fact that we were using JNDI
pools or not. Summarising:
● Splitting tables in many schemas makes it hard for GeoServer to access table belonging
to different schemas unless we switch to JNDI since the schema must be specified as
part of the connection params when using internal pools
● Using different users for different schemas prevent JNDI from working efficiently across
schemas. It’s best to use when possible a single dedicated account across schemas
● Generally speaking having a complex combination of users and schema can lead to
inefficient split of available connections in multiple pools

Long story short, whenever it’s possible strive to make use of a small number of users and if not
using JNDI to a small number of schema, although JNDI is a must for organization willing to
© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]
8

create a complex set up where different workspaces6 (i.e Virtual Services) serve the same
content differently.

Validation queries 
Regardless of how we configure the validation query it is extremely important that we always
remember to validate connections before using them in GeoServer; not doing this might lead to
spurious errors due to stale connections sitting the pool. This can be achieved with the internal
connection pool (via the ​validate connections b​ ox) as well as with the pools declared in JNDI
(via the ​validation query ​mechanism); it is worth to remind that the latter will account for finer
grain configurability.

6
http://docs.geoserver.org/latest/en/user/services/virtual-services.html
© 2016 GeoSolutions SAS - All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form
or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission
of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted
by copyright law. For permission requests, write to the publisher"at [email protected]

You might also like