Skip to content

Added the OPT_SERVER_TIMEOUT_LIMIT behaviour. #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

poison
Copy link
Member

@poison poison commented Mar 12, 2014

The option MEMCACHED_BEHAVIOR_SERVER_TIMEOUT_LIMIT was added in libmemcached 1.0.18 (0x01000018).

For more information see https://answers.launchpad.net/libmemcached/+question/239497

The merge request for libmemcached:
https://code.launchpad.net/~493pocbrcycmdw7yksonho9o2qzz-o18bz-d18ecat4t1b76tkfi3vttrkfngli/libmemcached/feature-server_timeout

@poison
Copy link
Member Author

poison commented Mar 12, 2014

A bit of a history on this commit (it took us quite some time to get the patch upstream for libmemcached).

You have several parameters which control how and when libmemcached will mark your server as dead for timeout. There are several back-off parameters in place which protect against flooding a server that's hardly responding, especially with persistent connections.

How libmemcached < 1.0.18 works is as follows:
whenever a single timeout occurs on a memcache operation, the server is immediately flagged as MEMCACHED_SERVER_STATE_IN_TIMEOUT, in order to prevent flooding the server (the memcached server itself might be overloaded, the network can be buggy, or the client can be too loaded to handle the incoming network packets in time).

We use persistent connections between our webservers and our memcached servers in our platform. We observed that in most cases when we start to have "SERVER IS MARKED AS DEAD" errors the webservers themselves were overloaded. Our memcache timeouts are quite strict (150ms) because we have an SLA on page rendering time at our origin.

What happens is our case is that a request misses a memcache operation, the server is marked as dead, and was kept for 2 seconds (Memcached::OPT_RETRY_TIMEOUT == 2).

Suppose we have 10 requests per second per FPM worker, this means that the subsequent 20 requests will experience that same server as "SERVER IS MARKED AS DEAD" without even trying to do something to the server.

This was too aggressive, so we introduced the parameter HAVE_MEMCACHED_BEHAVIOR_SERVER_TIMEOUT_LIMIT which control how many of those glitches must occur in order to flag a server as DEAD.

Setting this option to 5 extremely reduced the amount of "SERVER IS MARKED AS DEAD" errors in our production environment. The disadvantage was that whenever a glitch occurred, we were delaying the generation time by 5x the timeout.

It's always a balance you have to make, but we're glad this finally made it upstream so we can stop using our custom forks.

For more info check https://answers.launchpad.net/libmemcached/+question/239497

@mkoppanen
Copy link
Member

LGTM

mkoppanen added a commit that referenced this pull request Mar 21, 2014
Added the OPT_SERVER_TIMEOUT_LIMIT behaviour.
@mkoppanen mkoppanen merged commit 94f6444 into php-memcached-dev:master Mar 21, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants