Added the OPT_SERVER_TIMEOUT_LIMIT behaviour. #130

poison · 2014-03-12T14:43:39Z

The option MEMCACHED_BEHAVIOR_SERVER_TIMEOUT_LIMIT was added in libmemcached 1.0.18 (0x01000018).

For more information see https://answers.launchpad.net/libmemcached/+question/239497

The merge request for libmemcached:
https://code.launchpad.net/~493pocbrcycmdw7yksonho9o2qzz-o18bz-d18ecat4t1b76tkfi3vttrkfngli/libmemcached/feature-server_timeout

The option MEMCACHED_BEHAVIOR_SERVER_TIMEOUT_LIMIT was added in libmemcached 1.0.18 (0x01000018). For more information see https://answers.launchpad.net/libmemcached/+question/239497 The merge request for libmemcached: https://code.launchpad.net/~493pocbrcycmdw7yksonho9o2qzz-o18bz-d18ecat4t1b76tkfi3vttrkfngli/libmemcached/feature-server_timeout

poison · 2014-03-12T15:03:13Z

A bit of a history on this commit (it took us quite some time to get the patch upstream for libmemcached).

You have several parameters which control how and when libmemcached will mark your server as dead for timeout. There are several back-off parameters in place which protect against flooding a server that's hardly responding, especially with persistent connections.

How libmemcached < 1.0.18 works is as follows:
whenever a single timeout occurs on a memcache operation, the server is immediately flagged as MEMCACHED_SERVER_STATE_IN_TIMEOUT, in order to prevent flooding the server (the memcached server itself might be overloaded, the network can be buggy, or the client can be too loaded to handle the incoming network packets in time).

We use persistent connections between our webservers and our memcached servers in our platform. We observed that in most cases when we start to have "SERVER IS MARKED AS DEAD" errors the webservers themselves were overloaded. Our memcache timeouts are quite strict (150ms) because we have an SLA on page rendering time at our origin.

What happens is our case is that a request misses a memcache operation, the server is marked as dead, and was kept for 2 seconds (Memcached::OPT_RETRY_TIMEOUT == 2).

Suppose we have 10 requests per second per FPM worker, this means that the subsequent 20 requests will experience that same server as "SERVER IS MARKED AS DEAD" without even trying to do something to the server.

This was too aggressive, so we introduced the parameter HAVE_MEMCACHED_BEHAVIOR_SERVER_TIMEOUT_LIMIT which control how many of those glitches must occur in order to flag a server as DEAD.

Setting this option to 5 extremely reduced the amount of "SERVER IS MARKED AS DEAD" errors in our production environment. The disadvantage was that whenever a glitch occurred, we were delaying the generation time by 5x the timeout.

It's always a balance you have to make, but we're glad this finally made it upstream so we can stop using our custom forks.

For more info check https://answers.launchpad.net/libmemcached/+question/239497

mkoppanen · 2014-03-21T09:43:26Z

LGTM

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour.

mkoppanen added a commit that referenced this pull request Mar 21, 2014

Merge pull request #130 from poison/feature-servertimeoutlimit

94f6444

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour.

mkoppanen merged commit 94f6444 into php-memcached-dev:master Mar 21, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour. #130

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour. #130

poison commented Mar 12, 2014

poison commented Mar 12, 2014

mkoppanen commented Mar 21, 2014

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour. #130

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour. #130

Conversation

poison commented Mar 12, 2014

poison commented Mar 12, 2014

mkoppanen commented Mar 21, 2014