Skip to content

Internal Cache blows up with many constrain's #460

@doyougnu

Description

@doyougnu

Hey Levent,

I have a computation that exists entirely in the Query monad. I recently did some profiling and it looks like the internal Cache is blowing up in size leading to about 95% GC use. So I'm looking for help in understanding why this behavior is occurring and how to avoid it.

The basic computation is a fold over a custom abstract syntax tree where I accumulate on SBools. I've been able to reproduce this with the following tiny program that folds a list of SBools in the query monad:

import qualified Data.SBV                as S
import qualified Data.SBV.Control        as SC
import qualified Data.SBV.Internals      as SI

-- | generate an infinite list of unique strings and take n of them dropping the
-- empty string
stringList :: Int -> [String]
stringList n = tail . take (n+1) $ concatMap (flip replicateM "abc") [0..]


-- | the test runner, takes a computation that runs in the query monad and an
-- int that dictates size of the list of sbools
test :: ([S.SBool] -> SC.Query (Map String Bool)) -> Int -> IO (Map String Bool)
test f n = S.runSMT $
           do
             prop' <- S.sBools $! stringList n
             SC.query $ f prop'


-- | I fold over the string of SBools here constraining at each accumulation,
-- this seems to blow up the internal cache severely leading to about 95% GC
bad :: [S.SBool] -> SC.Query (Map String Bool)
bad prop' = do b <- foldM (helper) S.true prop'
               S.constrain b
               fmap (fmap SI.cwToBool) $ S.getModelDictionary <$> SC.getSMTResult
  -- | combine the current sbool with the accumulated sbool, constrain the
  -- two and then return the accumulated result
  where helper x acc = do let b = x S.&&& acc
                          S.constrain b
                          return b


-- | identical to the bad version but I do not constrain for each accumulation
good :: [S.SBool] -> SC.Query (Map String Bool)
good prop' = do b <- foldM (helper) S.true prop'
                S.constrain b
                fmap (fmap SI.cwToBool) $ S.getModelDictionary <$> SC.getSMTResult
  -- | this helper is equivalent to just foldr' (S.&&&)
  where helper x acc = do let b = x S.&&& acc
                          return b

main = do
  putStrLn "Running Good:\n"
  goodRes <- test good 1000

  putStrLn "Running Bad:\n"
  badRes <- test bad  1000

  -- just ensuring evaluation
  print (size goodRes)
  print (size badRes)

I just commented out the lines for each test and ran with stack bench --profile cache-test --benchmark-arguments='+RTS -hc -s -RTS'; I get the following results:

for good:

Running Good:

     278,823,496 bytes allocated in the heap
1000
      15,291,392 bytes copied during GC
Benchmark auto: FINISH
       2,165,352 bytes maximum residency (7 sample(s))
         110,480 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       165 colls,   165 par    0.064s   0.017s     0.0001s    0.0010s
  Gen  1         7 colls,     6 par    0.051s   0.014s     0.0021s    0.0042s

  Parallel GC work balance: 30.47% (serial 0%, perfect 100%)

  TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.004s  (  0.003s elapsed)
  MUT     time    0.332s  (  0.316s elapsed)
  GC      time    0.102s  (  0.027s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.013s  (  0.004s elapsed)
  EXIT    time    0.001s  (  0.000s elapsed)
  Total   time    0.452s  (  0.350s elapsed)

  Alloc rate    840,844,689 bytes per MUT second

  Productivity  73.5% of total user, 90.3% of total elapsed

gc_alloc_block_sync: 3526
whitehole_spin: 0
gen[0].sync: 1
gen[1].sync: 35
Completed 2 action(s).

with heap profile.

goodtest

and for bad:

Running Bad:

1000
   2,131,924,376 bytes allocated in the heap
   2,973,614,912 bytes copied during GC
     239,523,080 bytes maximum residency (27 sample(s))
         920,312 bytes maximum slop
             472 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1497 colls,  1497 par   48.666s  12.479s     0.0083s    0.0225s
  Gen  1        27 colls,    26 par   12.980s   3.371s     0.1248s    0.3105s

  Parallel GC work balance: 82.55% (serial 0%, perfect 100%)

  TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.003s  (  0.003s elapsed)
  MUT     time    2.904s  (  2.630s elapsed)
  GC      time   53.022s  ( 13.638s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    8.624s  (  2.212s elapsed)
  EXIT    time    0.001s  (  0.001s elapsed)
  Total   time   64.553s  ( 18.483s elapsed)

  Alloc rate    734,259,598 bytes per MUT second

  Productivity   4.5% of total user, 14.2% of total elapsed

gc_alloc_block_sync: 498638
whitehole_spin: 0
gen[0].sync: 2
gen[1].sync: 689287
Benchmark auto: FINISH
Completed 2 action(s).

with heap profile

badtest

Notice the discrepancy in the GC time and calculated Productivity and the difference in the y-axis of the heap profiles. So I assume it is the case that (constrain $ a and b and c) /= ((constrain (a and b) >> return (a and b) >>= constrain . (and c)). Any suggestions for whats happening here? I've already reduced the amount of constrains in my computation to a maximum, but I'm dealing with about 16000 variables which easily blows up the cache.

Let me know if I can further assist in any way and thanks for the help.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions