Skip to content

storage: Handle timestamp collisions in timestampCache #9100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 7, 2016

Conversation

bdarnell
Copy link
Contributor

@bdarnell bdarnell commented Sep 4, 2016

When two intervals in the timestamp cache have the same timestamp,
neither of them can be said to own that timestamp. We must adjust
intervals and clear transaction IDs whenever this collision occurs.

Failure to do so allowed one transaction to write after another
transaction had read at the same timestamp, leading to a violation of
serializability.

Fixes #9083

@cockroachdb/stability @nvanbenschoten


This change is Reviewable

@bdarnell bdarnell added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Sep 4, 2016
@nvb
Copy link
Contributor

nvb commented Sep 4, 2016

Reviewed 4 of 4 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


storage/timestamp_cache.go, line 436 [r1] (raw file):

      end = start.Next()
  }
  var ok bool

You didn't touch this, but we might want to improve the name of this flag.


storage/timestamp_cache.go, line 442 [r1] (raw file):

      cache = tc.rCache
  }
  var txnID *uuid.UUID

s/txnID/maxTxnID/ or something a bit more descriptive. Also move up next to max := ...


storage/timestamp_cache.go, line 451 [r1] (raw file):

      } else if max.Equal(ce.timestamp) && txnID != nil &&
          (ce.txnID == nil || !uuid.Equal(*txnID, *ce.txnID)) {
          txnID = nil

It would be nice to alias nil *UUIDs to a variable in this file to make checks against this state more explicit and to make calls to timestampCache.add with a nil txn more clear:

var anyTxnID *uuid.UUID = nil // or some other name

storage/timestamp_cache.go, line 223 [r2] (raw file):

          sCmp := r.Start.Compare(key.Start)
          eCmp := r.End.Compare(key.End)
          if cv.timestamp.Less(timestamp) {

It would be nice to have a roachpb.Timestamp.Compare(roachpb.Timestamp) int method and then compare against < 0, > 0, == 0.


storage/timestamp_cache.go, line 337 [r2] (raw file):

              switch {
              case sCmp == 0 && eCmp == 0:
                  // New and old are equal; replace old with new and avoid the

s/; replace old with new and avoid the need to insert new//


storage/timestamp_cache.go, line 346 [r2] (raw file):

                  // Nil: ------------
                  clearTxnIfDifferent(&cv.txnID, txnID)
                  tcache.MoveToEnd(entry)

This shouldn't be needed, because the entry isn't getting a new timestamp, right?


storage/timestamp_cache.go, line 370 [r2] (raw file):

                  clearTxnIfDifferent(&cv.txnID, txnID)
                  r.End = key.Start
                  addRange(r)

Is:

addRange(r)
return

needed here? This should be the last overlap.


storage/timestamp_cache.go, line 383 [r2] (raw file):

                  clearTxnIfDifferent(&cv.txnID, txnID)
                  newKey := tcache.MakeKey(r.Start, key.Start)
                  newEntry := makeCacheEntry(newKey, cacheValue{timestamp: timestamp, txnID: txnID})

Can we define this cacheValue above addRange := func(r interval.Range) { and use in both places?


storage/timestamp_cache.go, line 409 [r2] (raw file):

                  // Old: ------------
                  //
                  // New:

Nit: I like how this diagram defines all three (New, Nil, and Old). Could we do that for all (maybe with Nil using a different symbol like ====)?


storage/timestamp_cache.go, line 425 [r2] (raw file):

                  // Old: ----
                  key.End, r.Start = r.Start, key.End
                  key := tcache.MakeKey(key.End, r.Start)

Can we rename this variable, it confused me for a little while because I didnt see the :=


storage/timestamp_cache.go, line 426 [r2] (raw file):

                  key.End, r.Start = r.Start, key.End
                  key := tcache.MakeKey(key.End, r.Start)
                  newCV := cacheValue{timestamp: cv.timestamp, txnID: txnID}

Again, we might want to define this above? See comment on line 383. Here we could just reassign the value if we need to mutate it.


storage/timestamp_cache.go, line 431 [r2] (raw file):

                  tcache.AddEntryAfter(newEntry, entry)
              case sCmp == 0:
                  // Left-aligned partial overlap; truncate old start and

Here and in a few other states (Right-aligned partial overlap, Left partial overlap, Right partial overlap), if the TxnIDs are the same, we can return without splitting intervals, right (or only doing one split)? This is similar to what we do with New and old are equal.

This means we might want clearTxnIfDifferent to return a flag signifying if neither UUID is nil and if both a equal.


storage/timestamp_cache.go, line 458 [r2] (raw file):

                  tcache.AddEntryAfter(newEntry, entry)
                  // We can add the new range now because it is guaranteed to
                  // be any other overlaps; we ust do so because we've changed

s/ust/must


storage/timestamp_cache.go, line 460 [r2] (raw file):

                  // be any other overlaps; we ust do so because we've changed
                  // our boundaries and continuing to iterate may hit the "no
                  // overlap" panic.

Really? Won't this be guaranteed to be the last overlap?


Comments from Reviewable

@a-robinson
Copy link
Contributor

For my own edification, would you mind explaining why we need to nil out the span's owner rather than letting the transaction that got there first maintain ownership of it? What would be the harm in letting the "first" transaction finish as long as we prevented the other transactions from continuing with the same timestamp?

@bdarnell
Copy link
Contributor Author

bdarnell commented Sep 5, 2016

Review status: 3 of 4 files reviewed at latest revision, 14 unresolved discussions.


storage/timestamp_cache.go, line 436 [r1] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You didn't touch this, but we might want to improve the name of this flag.

`ok` is a fairly go idiomatic name for this kind of thing, but I've expanded the comments about it.

storage/timestamp_cache.go, line 442 [r1] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/txnID/maxTxnID/ or something a bit more descriptive. Also move up next to max := ...

Done.

storage/timestamp_cache.go, line 451 [r1] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It would be nice to alias nil *UUIDs to a variable in this file to make checks against this state more explicit and to make calls to timestampCache.add with a nil txn more clear:

var anyTxnID *uuid.UUID = nil // or some other name
I don't think that defining aliases for nil like that generally improve clarity.

storage/timestamp_cache.go, line 223 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It would be nice to have a roachpb.Timestamp.Compare(roachpb.Timestamp) int method and then compare against < 0, > 0, == 0.

Maybe, but I'm not sure if we'd use it anywhere else and for a single use this is fine.

storage/timestamp_cache.go, line 337 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/; replace old with new and avoid the need to insert new//

Done.

storage/timestamp_cache.go, line 346 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This shouldn't be needed, because the entry isn't getting a new timestamp, right?

Right. Removed.

storage/timestamp_cache.go, line 370 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is:

addRange(r)
return

needed here? This should be the last overlap.

I don't think it is; removed.

storage/timestamp_cache.go, line 383 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we define this cacheValue above addRange := func(r interval.Range) { and use in both places?

I think we need the late binding of `txnID` in cases where there are multiple overlaps.

storage/timestamp_cache.go, line 409 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nit: I like how this diagram defines all three (New, Nil, and Old). Could we do that for all (maybe with Nil using a different symbol like ====)?

OK, filled in all the example outputs even when they're the same as the inputs or empty.

storage/timestamp_cache.go, line 425 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we rename this variable, it confused me for a little while because I didnt see the :=

Yeah. I copied this pattern from some existing code but got tripped up by the shadowing at one point too. I'll rename them all (and I'm surprised go vet didn't complain)

storage/timestamp_cache.go, line 431 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Here and in a few other states (Right-aligned partial overlap, Left partial overlap, Right partial overlap), if the TxnIDs are the same, we can return without splitting intervals, right (or only doing one split)? This is similar to what we do with New and old are equal.

This means we might want clearTxnIfDifferent to return a flag signifying if neither UUID is nil and if both a equal.

Good point. It's just an optimization but it does seem like a common one since a transaction that touches the same rows multiple times will hit this case. Instead of returning a value from clearTxnIfDifferent I've made a whole separate switch statement with it's own ascii-diagrams for the case when both timestamps and transaction IDs are the same.

storage/timestamp_cache.go, line 458 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/ust/must

Done.

storage/timestamp_cache.go, line 460 [r2] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Really? Won't this be guaranteed to be the last overlap?

Yeah, I think the panics I was seeing before were from other bugs. Removed.

Comments from Reviewable

@a-robinson
Copy link
Contributor

Would you mind explaining why we need to nil out the span's owner rather than letting the transaction that got there first maintain ownership of it? What would be the harm in letting the "first" transaction finish as long as we prevented the other transactions from continuing with the same timestamp? I assume there will be cases in which the "first" transaction is already done by the time the "second" one sees the cache entry anyway.


Review status: 3 of 4 files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

bdarnell commented Sep 6, 2016

In this case the two transactions are read-only at the time of the conflict (they go on to write later). We can't require that all read-only transactions have unique timestamps (think of time-travel queries), so we have to allow the reads to proceed and deal with the overlap if and only if they later issue a conflicting write.

@tbg
Copy link
Member

tbg commented Sep 6, 2016

:lgtm: but I only looked at some of the same-txn optimizations in detail.


Reviewed 4 of 4 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 16 unresolved discussions, some commit checks failed.


storage/timestamp_cache.go, line 416 [r1] (raw file):

// transactions in the cache, the low water timestamp is returned for
// the read timestamps. Also returns an "ok" bool, indicating whether
// an explicit match of the interval was found in the cache.

From the comment, I'm not clear on what the last part means. ok is equivalent to uuid != nil or when the timestamp returned didn't originate from the low water mark?


storage/timestamp_cache.go, line 339 [r2] (raw file):
Should this read

Segment is unowned if multiple transactions were involved.


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Sep 6, 2016

:lgtm:


Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


storage/timestamp_cache.go, line 354 [r4] (raw file):

                  // Old:
                  tcache.DelEntry(entry)
              case eCmp >= 0:

For this case and the next, cant we just delete old and extend new's bounds to avoid multiple entries?


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

bdarnell commented Sep 7, 2016

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


storage/timestamp_cache.go, line 354 [r4] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

For this case and the next, cant we just delete old and extend new's bounds to avoid multiple entries?

That would be the first time we expand the bounds of an interval rather than shrinking it. Is it safe to do so? Why expand new instead of old? If it's safe to expand an interval, then we could replace this whole switch (the outcome of every case is a single span covering the union of old and new).

Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Sep 7, 2016

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


storage/timestamp_cache.go, line 354 [r4] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

That would be the first time we expand the bounds of an interval rather than shrinking it. Is it safe to do so? Why expand new instead of old? If it's safe to expand an interval, then we could replace this whole switch (the outcome of every case is a single span covering the union of old and new).

Yeah you're right, that would be a first. I dont think it's worth making this logic any more complex.

Comments from Reviewable

@nvb nvb mentioned this pull request Sep 7, 2016
timestampCache.GetMax{Read,Write} previously took a transaction ID to
act as that transaction. However, this wasn't always correct - when
acting as a transaction we look past that transaction's writes to see
what's underneath, but the trimming of inserted spans means that this
would often return the low water mark, ignoring spans that had been
present but were trimmed away.

Instead, the timestamp cache now returns the transaction associated with
the timestamp it is returning, so the caller can make its own decision
about whether to ignore the timestamp or not.
When two intervals in the timestamp cache have the same timestamp,
neither of them can be said to own that timestamp. We must adjust
intervals and clear transaction IDs whenever this collision occurs.

Failure to do so allowed one transaction to write after another
transaction had read at the same timestamp, leading to a violation of
serializability.

Fixes cockroachdb#9083
When intervals have both the same timestamp and transaction ID
(which is fairly common since this is what happens whenever
a transaction touches the same rows more than once), we can avoid
the splitting required when timestamps are equal but transaction IDs
differ. Introduce (simple) special cases for this scenario.
@bdarnell bdarnell force-pushed the timestamp-cache branch 2 times, most recently from 30514de to 27a521e Compare September 7, 2016 05:11
@bdarnell bdarnell merged commit 7517390 into cockroachdb:master Sep 7, 2016
@bdarnell bdarnell deleted the timestamp-cache branch September 7, 2016 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants