Issue 2938006: Reland 51081:...

Issue 2938006: Reland 51081:... (Closed)

Created:
10 years, 5 months ago by Mike Belshe

Modified:
9 years, 5 months ago

Reviewers:
willchan no longer on Chromium

CC:
chromium-reviews, cbentzel+watch_chromium.org, darin-cc_chromium.org, Paweł Hajdan Jr.

Base URL:
svn://chrome-svn/chrome/trunk/src/

Visibility:
Public.

Description

Reland 51081: This is relandable now because we fixed a problem with the backup sockets, which was the real reason for initially reverting. We basically don't do late socket binding when a connect has already been started for a request, even if another socket frees up earlier. The reassignment logic was quite complicated, so I reworked it. Fixing this bug was easy by changing the way FindTopStalledGroup worked, but because that function is called in that loop, changing this case caused the loop to go infinitely in some cases. This led me to look into unwinding the loop. The problem really came down to ReleaseSocket/DoReleaseSocket. Because we allow for a pending queue of released sockets, we had to do this looping (which has been a source of bugs before). To fix, I eliminated the pending_releases queue. I also reworked the routes through OnAvailableSocketSlot to unify them and always run asynchronously. The result is that now we no longer have the loop. So when one socket is released, we hand out exactly one socket. Note also that this logic slightly changes the priority of how we recycle sockets. Previously, we always consulted the TopStalledGroup. The TopStalledGroup is really only interesting in the case where we're at our max global socket limit, which is rarely the case. In the new logic, when a socket is released, first priority goes to any pending socket in the same group, regardless of that group's priority. The reason is why close a socket we already have open? Previously, if the released socket's group was not the highest priority group, the socket would be marked idle, then closed (to make space for a socket to the TopStalledGroup), and finally a new socket created. I believe the new algorithm, while not perfectly matching the priorities, is more efficient (less churn on sockets), and also is more graceful to the common case. Finally OnAvailableSocketSlot does two things. First, it tries to "give" the now available slot to a particular group, which is dependent on how OnAvailableSocketSlot was called. If we're currently stalled on max sockets, it will also check (after giving the socket out) to see if we can somehow free something up to satisfy a stalled group. If that second step fails for whatever reason, we don't loop. In theory, this could mean that we go under the socket max and didn't dish out some sockets right away. To make sure that multiple stalled groups can get unblocked, we'll record the number of stalled groups, and once in this mode, OnAvailableSocketSlot will keep checking for stalled groups until the count finally drops to zero. BUG=47375 TEST=DelayedSocketBindingWaitingForConnect,CancelStalledSocketAtSocketLimit Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=52050

Patch Set 1 #

Created: 10 years, 5 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+386 lines, -306 lines)			Patch
M	net/socket/client_socket_pool_base.h	View	14 chunks	+28 lines, -61 lines	0 comments	Download
M	net/socket/client_socket_pool_base.cc	View	11 chunks	+98 lines, -127 lines	0 comments	Download
M	net/socket/client_socket_pool_base_unittest.cc	View	8 chunks	+252 lines, -112 lines	0 comments	Download
M	net/socket/socks_client_socket_pool_unittest.cc	View	3 chunks	+8 lines, -6 lines	0 comments	Download

Messages

Total messages: 2 (0 generated)

Expand Messages | Collapse Messages

Mike Belshe

There are no code changes from the original checkin. I verified that the new unittest ...

10 years, 5 months ago (2010-07-11 02:33:26 UTC) #1

willchan no longer on Chromium

10 years, 5 months ago (2010-07-11 05:03:11 UTC) #2

I think there were some reports of chrome-bot seeing flaky crashes on it the
last time, so you may want to look out for that.  Good luck!

LGTM.

On Sat, Jul 10, 2010 at 7:33 PM, <[email protected]> wrote:

> Reviewers: willchan,
>
> Message:
> There are no code changes from the original checkin.  I verified that the
> new
> unittest indeed catches the crash that previously caused the revert of this
> patch.
>
> I plan to land this with a TBR, although it has already been reviewed.
>
> Description:
> Reland 51081:
> This is relandable now because we fixed a problem with the backup sockets,
> which was the real reason for initially reverting.
>
> We basically don't do late socket binding when a connect has already
> been started for a request, even if another socket frees up earlier.
>
> The reassignment logic was quite complicated, so I reworked it.  Fixing
> this bug was easy by changing the way FindTopStalledGroup worked, but
> because that function is called in that loop, changing this case
> caused the loop to go infinitely in some cases.  This led me to look
> into unwinding the loop.
>
> The problem really came down to ReleaseSocket/DoReleaseSocket.  Because
> we allow for a pending queue of released sockets, we had to do this
> looping (which has been a source of bugs before).  To fix, I
> eliminated the pending_releases queue.  I also reworked the routes
> through OnAvailableSocketSlot to unify them and always run asynchronously.
>
> The result is that now we no longer have the loop.  So when one
> socket is released, we hand out exactly one socket.  Note also that
> this logic slightly changes the priority of how we recycle sockets.
> Previously, we always consulted the TopStalledGroup.  The TopStalledGroup
> is really only interesting in the case where we're at our max global
> socket limit, which is rarely the case.  In the new logic, when a
> socket is released, first priority goes to any pending socket in the
> same group, regardless of that group's priority.  The reason is  why
> close a socket we already have open?  Previously, if the released
> socket's group was not the highest priority group, the socket would
> be marked idle, then closed (to make space for a socket to the
> TopStalledGroup), and finally a new socket created.  I believe the
> new algorithm, while not perfectly matching the priorities, is more
> efficient (less churn on sockets), and also is more graceful to the
> common case.
>
> Finally OnAvailableSocketSlot does two things.  First, it tries to
> "give" the now available slot to a particular group, which is dependent
> on how OnAvailableSocketSlot was called.  If we're currently
> stalled on max sockets, it will also check (after giving the socket
> out) to see if we can somehow free something up to satisfy a
> stalled group.  If that second step fails for whatever reason,
> we don't loop.  In theory, this could mean that we go under the
> socket max and didn't dish out some sockets right away.  To make
> sure that multiple stalled groups can get unblocked, we'll record
> the number of stalled groups, and once in this mode,
> OnAvailableSocketSlot will keep checking for stalled groups until the
> count finally drops to zero.
>
>
> BUG=47375
> TEST=DelayedSocketBindingWaitingForConnect,CancelStalledSocketAtSocketLimit
>
>
>
>
> Please review this at http://codereview.chromium.org/2938006/show
>
> SVN Base: svn://chrome-svn/chrome/trunk/src/
>
> Affected files:
>  M     net/socket/client_socket_pool_base.h
>  M     net/socket/client_socket_pool_base.cc
>  M     net/socket/client_socket_pool_base_unittest.cc
>  M     net/socket/socks_client_socket_pool_unittest.cc
>
>
>

Expand Messages | Collapse Messages