Chaff Engineering An Efficient SAT Solver
Chaff Engineering An Efficient SAT Solver
1
Chaff: Engineering an Efficient SAT Solver
Matthew W. Moskewicz Conor F. Madigan Ying Zhao, Lintao Zhang, Sharad Malik
Department of EECS Department of EECS Department of Electrical Engineering
UC Berkeley MIT Princeton University
[email protected] [email protected] {yingzhao, lintaoz, sharad}@ee.princeton.edu
530
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on September 21,2023 at 23:19:23 UTC from IEEE Xplore. Restrictions apply.
clause consists of only literals with value 0 and one unassigned the counter every time a literal in the clause is set to 0. However,
literal, then that unassigned literal must take on a value of 1 to if the clause has N literals, there is really no reason that we need
makefsat. Clauses in this state are said to be unit, and this rule to visit it when 1, 2, 3, 4, ... , N-1 literals are set to zero. We
is referred to as the unit clause rule. The necessary variable would like to only visit it when the “number of zero literals”
assignment associated with giving the unassigned literal a value of counter goes from N-2 to N- 1.
1 is referred to as an implication. In general, BCP therefore As an approximation to this goal, we can pick any two
consists of the identification of unit clauses and the creation of literals not assigned to 0 in each clause to watch at any given
the associated implications. In the pseudo-code from above, time. Thus, we can guarantee that until one of those two literals is
bcp ( ) carries out BCP transitively until either there are no more assigned to 0, there cannot be more than N-2 literals in the clause
implications (in which case it retums true) or a conflict is assigned to zero, that is, the clause is not implied. Now, we need
produced (in which case it retums false). A conflict occurs when only visit each clause when one of its two watched literals is
implications for setting the same variable to both 1 and 0 are assigned to zero. When we visit each clause, one of two
produced. conditions must hold:
At the time a decision is made, some variable state exists and
is represented by the decision stack. Any implication generated (1) The clause is not implied, and thus at least 2 literals are not
following a new decision is directly triggered by that decision, but assigned to zero, including the other currently watched
predicated on the entire prior variable state. By associating each literal. This means at least one non-watched literal is not
implication with the triggering decision, this dependency can be assigned to zero. We choose this literal to replace the one
compactly recorded in the form of an integer tag, referred to as the just assigned to zero. Thus, we maintain the property that the
decision level (DL). For the basic DP search, the DL is two watched literals are not assigned to 0.
equivalent to the height of the decision stack at the time the (2) The clause is implied. Follow the procedure for visiting an
implication is generated. implied clause (usually, this will generate a new implication,
To explain what handleconf1i c t ( ) does, we note that unless the unless the clause is already sat). One should take
we can invalidate all the implications generated on the most note that the implied variable must always be the other
recent decision level simply by flipping the value of the most watched literal, since, by definition, the clause only has one
recent decision assignment. Therefore, to deal with a conflict, we literal not assigned to zero, and one of the two watched
can just undo all those implications, flip the value of the decision literals is now assigned to zero.
assignment, and allow BCP to then proceed as normal. If both
values have already been tried for this decision, then we backtrack It is invariant that in any state where a clause can become
through the decision stack until we encounter a decision that has newly implied, both watched literals are not assigned to 0. A key
not been tried both ways, and proceed from there in the manner benefit of the two literal watching scheme is that at the time of
described above. Clearly, in backtracking through the decision backtracking, there is no need to modify the watched literals in
stack, we invalidate any implications with decision levels equal to the clause database. Therefore, unassigning a variable can be done
or greater than the decision level to which we backtracked. If no in constant time. Further, reassigning a variable that has been
decision can be found which has not been tried both ways, that recently assigned and unassigned will tend to be faster than the
indicates that f is not satisfiable. first time it was assigned. This is true because the variable may
Thus far we have focused on the overall structure of the basic only be watched in a small subset of the clauses in which was
DP search algorithm. The following sections describe features previously watched. This significantly reduces the total number of
specific to Chaff. memory accesses, which, exacerbated by the high data cache m i s s
rate is the main bottleneck for most SAT implementations. Figure
2. Optimized BCP 1 illustrates this technique. It shows how the watched literals for a
In practice, for most SAT problems, a major potion (greater single clause change under a series of assignments and
than 90% in most cases) of the solvers’ run time is spent in the unassignments. Note that the initial choice of watched literals is
BCP process. Therefore, an efficient BCP engine is key to any arbitrary, and that for the purposes of this example, the exact
SAT solver. details of how the sequence of assignments and unassignments is
To restate the semantics of the BCP operation: Given a being generated is irrelevant.
formula and set of assignments with DLs, deduce any necessary One of the SATO[131 BCP schemes has some similarities to
assignments and their DLs, and continue this process transitively this one in the sense that it also watches two literals (called the
by adding the necessary assignments to the initial set. Necessary head and tail literals by its authors) to detect unit clauses and
assignments are determined exclusively by repeated applications conflicts. However, our algorithm is different from SATO’s in
of the unit clause rule. Stop when no more necessary assignments that we do not require a fixed direction of motion for the watched
can be deduced, or when a conflict is identified. literals while in SATO, the head literal can only move towards tail
For the purposes of this discussion, we say that a clause is literal and vice versa. Therefore, in SATO, unassignment has the
implied iifall but one of its literals is assigned to zero. So, to same complexity as assignment.
implement BCP efficiently, we wish to find a way to quickly visit
all clauses that become newly implied by a single addition to a set 3. Variable State Independent Decaying Sum
of assignments. (VSIDS) Decision Heuristic
The most intuitive way to do this is to simply look at every Decision assignment consists of the determination of which
clause in the database clauses that contain a literal that the current new variable and state should be selected each time decide ( ) is
assignment sets to 0. In effect, we would keep a counter for each called. A lack of clear statistical evidence supporting one
clause of how many value 0 literals are in the clause, and modify decision strategy over others has made it difficult to determine
53 1
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on September 21,2023 at 23:19:23 UTC from IEEE Xplore. Restrictions apply.
what makes a good decision strategy and what makes a bad one. Overall, this strategy can be viewed as attempting to satisfy
To explain this further, we briefly review some common the conflict clauses but particularly attempting to satisfy recent
strategies. For a more comprehensive review of the effect of conflict clauses. Since difficult problems generate many conflicts
decision strategies on SAT solver performance, see [7] by Silva. (and therefore many conflict clauses), the conflict clauses
The simplest possible strategy is to simply select the next dominate the problem in terms of literal count, so this approach
decision randomly from among the unassigned variables, an distinguishes itself primarily in how the low pass filtering of the
approach commonly denoted as RAND. At the other extreme, statistics (indicated by step (5)) favors the information generated
one can employ a heuristic involving the maximization of some by recent conflict clauses. We believe this is valuable because it is
moderately complex function of the current variable state and the the conflict clauses that primarily drive the search process on
clause database (e.g. BOHM and MOMS heuristics). difficult problems. And so this decision strategy can be viewed as
One of the most popular strategies, which falls somewhere in directly coupling that driving force to the decision process.
the middle of this spectrum, is the dynamic largest individual sum Of course, another key property of this strategy is that since
(DLIS) heuristic, in which one selects the literal that appears most it is independent of the variable state (except insofar as we must
frequently in unresolved clauses. Variations on this strategy (e.g. choose an unassigned variable) it has very low overhead, since the
RDLIS and DLCS) are also possible. Other slightly more statistics are only updated when there is a conflict, and
sophisticated heuristics (e.g. JW-OS and E-TS) have been correspondingly, a new conflict clause. Even so, decision related
developed as well, and the reader is referred again to [7] for a full computation is still accounts for -10% of the run-time on some
description of these other methods. difficult instances. (Conflict analysis is also -10% of the run-time,
Clearly, with so many strategies available, it is important to with the remaining -80% of the time spent in BCP.) Ultimately,
understand how best to evaluate them. One can consider, for employing this strategy dramatically (i.e. an order of magnitude)
instance, the number of decisions performed by the solver when improved performance on all the most difficult problems without
processing a given problem. Since this statistic has the feel of a hurting performance on any of the simpler problems, which we
good metric for analyzing decision strategies - fewer decisions viewed as the true metric of its success.
ought to mean smarter decisions were made, the reasoning goes -
it has been used almost exclusively as the comparator in the scant 4. Other Features
literature on the subject. However, not all decisions yield an Chaff employs a conflict resolution scheme that is
equal number of BCP operations, and as a result, a shorter philosophically very similar to GRASP, employing the same type
sequence of decisions may actually lead to more BCP operations of conflict analysis, conflict clause addition, and UIP-
than a longer sequence of decisions, begging the question: what identification. There are some differences that the authors believe
does the number of decisions really tell us? The same argument have dramatically enhanced the simplicity and elegance of the
applies to statistics involving conflicts. Furthermore, it is also implementation, but due to space limitations, we will not delve
important to recognize that not all decision strategies have the into that subject here.
same computational overhead, and as a result, the “best” decision
strategy - even if that determination is based on a good 4.1 Clause Deletion
combination of the available computation statistics - may Like many other solvers, Chaff supports the deletion of
actually be the slowest if the overhead is significant enough. All added conflict clauses to avoid a memory explosion. However,
we really want to know is which strategy is fastest, regardless of since the method for doing so in Chaff differs somewhat from the
the computation statistics. No clear answer exists in the literature, standard method, we briefly describe it here. Essentially, Chaff
though based on [7] DLIS would appear to be a solid all-around uses scheduled lazy clause deletion. When each clause is added, it
strategy. However, even RAND performs well on the problems is examined to determine at what point in the future, if any, the
described in that paper. While developing our solver, we clause should be deleted. The metric used is relevance, such that
implemented and tested all of the strategies outlined above, and when more than N (where N is typically 100-200) literals in the
found that we could design a considerably better strategy for the clause will become unassigned for the first time, the clause will be
range of problems on which we tested our solver. This strategy, marked as deleted. The actual memory associated with deleted
termed Variable State Independent Decaying Sum (VSIDS) is clauses is recovered with an infrequent monolithic database
described as follows: compaction step.
532
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on September 21,2023 at 23:19:23 UTC from IEEE Xplore. Restrictions apply.
frequency of restarts and the characteristics of the transient examples within 100secs. Both GRASP and SATO abort the 5
randomness are configurable in the final implementation. It hard unsat instances in this set, which are known to take both
should be noted that restarts impact the completeness of the GRASP and SATO significantly longer to complete than the sat
algorithm. If all clauses were kept, however, the algorithm would instances. Results on using randomized restart techniques with the
still be complete, so completeness could be maintained by newest version of GRASP have been reported on a subset of these
increasing the relevance parameter N slowly with time. GRASP examples in [l]. We have been unable to reproduce all of those
uses a similar strategy to maintain completeness by extending the results, due to the unavailability of the necessary configuration
restart period with each restart (Chaff also does this by default, profiles for GRASP (again, see [12]). However, comparing our
since it generally improves performance). experiments with the reported results shows the superiority of
Note that Chaff s restarts differ from those employed by, for Chaff, even given a generous margin for the differences in the
instance, GRASP in that they do not affect the current decision testing environments. For SSS.l.0.a Chaff completed all 9 of the
statistics. They mainly are intended to provide a chance to change benchmarks - SATO and GRASP could do only two. For SSS-
early decisions in view of the current problem state, including all SAT.l.0, SATO aborted 32 of the first 41 instances when we
added clauses and the current search path. With default settings, decided to stop running any further instances for lack of hope and
Chaff may restart in this sense thousands of times on a hard limited compute cycles. GRASP was not competitive at all on this
instance (sat or unsat), although similar results can often (or at set. Chaff again completed all 100 in less than 1000secs, within a
least sometimes) be achieved with restarts completely disabled. IOOsec limit for each instance. In FVP-UNSAT.l.O both GRASP
and SATO could only complete one easy example and aborted the
5. Experimental Results next two. Chaff completed all 4. Finally for VLIW-SAT. 1.O both
On smaller examples with relatively inconsequential run SATO and GRASP aborted the first 19 of twenty instances tried.
times, Chaff is comparable to any other solver. However, on Chaff finished all 100 in less than 10000 seconds total.
larger examples where other solvers struggle or give up, Chaff For many of these benchmarks, only incomplete solvers (not
dominates by completing in up to one to two orders ofmagnitude considered here) can find solutions in time comparable to Chaff,
less time than the best public domain solvers. and for the harder unsatisfiable instances in these benchmarks, no
Chaff has been run on and compared with other solvers on solver the authors were able to run was within lox of Chaffs
almost a thousand benchmark formulas. Obviously, it is performance, which prohibited running them on the harder
impossible to provide complete results for each individual problems. When enough information is released to run GRASP
benchmark. Instead, we will present summary results for each and locally reproduce results as in [l], these results will be
class of benchmarks. Comparisons were done with GRASP, as revisited, although the results given would indicate that Chaff is
well as SATO. GRASP provides for a range of parameters that still a full 2 orders of magnitude faster on the hard unsat
can be individually tuned. Two different recommended sets of instances, and at least 1 order of magnitude faster on the
parameters were used (GRASP(A) and GRASP(B)). For SATO, satisfiable instances.
the default settings as well as -g100 (which restricts the size of
added clauses to be 100 literals as opposed to the default of 20) 7. Conclusions
were used. Chaff was used with the default cheny.smj This paper describes a new SAT solver, Chaff, which has
configuration in all cases, except for the dimacs pret* instances, been shown to be at least an order of magnitude (and in several
which required a single parameter change to the decision strategy. cases, two orders of magnitude) faster than existing public domain
All experiments were done on a 4 CPU 336 Mhz UltraSparc I1 SAT solvers on difficult problems from the EDA domain. This
Solaris machine with 4GB main memory. Memory usage was speedup is not the result of sophisticated learning strategies for
typically 50-150MB depending on the run time of each instance. pruning the search space, but rather, of efficient engineering of
Table 1 provides the summary results for the DIMACs [4] the key steps involved in the basic search algorithm. Specifically,
benchmark suite. Each row is a set of individual benchmarks this speedup is derived from:
grouped by category. For GRASP, both options resulted in several
benchmarks aborting after 1OOsecs, which was sufficient for both a highly optimized BCP algorithm, and
SATO and Chaff to complete all instances. On examples that the a decision strategy highly optimized for speed, as well
others also complete, Chaff is comparable to the others, with as focused on recently added clauses.
some superiority on the hole and par1 6 classes, which seem to be
among the more difficult ones. Overall, most of the DIMACs
benchmarks are now considered easy, as there are a variety of
solvers that excel on various subsets of them. Note that some of 8. References
the DIMACS benchmarks, such as the large 3-sat instance sets ‘f [ 11 Baptista, L., and Marques-Silva, J.P., “Using Randomization
and ‘g’, as well as the par32 set were not used, since none of the and Learning to Solve Hard Real-World Instances of
solvers considered here performs well on these benchmark Satisfiability,” Proceedings of the 6th International Conference
classes. on Principles and Practice of Constraint Programming (CP),
The next set of experiments was done using the CMU September 2000.
Benchmark Suite [ 111. This consists of hard problems, satisfiable [2] Bayardo, R. and Schrag, R.: Using CSP look-back techniques
and unsatisfiable, arising from verification of microprocessors (for to solve real-world SAT instances, in Proc. of the 14th Nat.
a detailed description of these benchmarks and Chaffs (US) Conf. on Artificial Intelligence (AAAI-97), AAA1
performance on them, see [12]). It is here that Chaffs prowess Press/The MIT Press, 1997, pp. 203-208.
begins to show more clearly. For SSS. 1.O, Chaff is about an order [3] Biere, A., Cimatti, A., Clarke, E.M., and Zhu, Y., “Symbolic
of magnitude faster than the others and can complete all the Model Checking without BDDs,” Tools and Algorithms for the
533
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on September 21,2023 at 23:19:23 UTC from IEEE Xplore. Restrictions apply.
Analysis and Construction of Systems (TACAS’99), number [9] McAllester, D., Selman, B. and Kautz, H.: Evidence for
1579 in LNCS. Springer-Verlag, 1999. invariants in local search, in Proceedings of AAAI’97, MIT
(http://www.cs.cmu.edu/-modelcheck/bmc/bmc- Press, 1997, pp. 321-326.
benchmarks. html) [ 101 Stephan, P., Brayton, R., and Sangiovanni-Vencentelli,A.,
[4] DIMACS benchmarks available at “Combinational Test Generation Using Satisfiability,”IEEE
ftp://dimacs.rutgers.edu/pub/challenge/sat/benchmarks Transactions on Computer-Aided Design of Integrated Circuits
[5] Freeman, J.W., “Improvements to Propositional Satisfiability and Systems, vol. 15, 1167-1176, 1996.
Search Algorithms,” Ph.D. Dissertation, Department of [ 1I] Velev, M., FVP-UNSAT. 1.O, FVP-UNSAT.2.0, VLIW-
Computer and Information Science, University of SAT. 1.O, SSS-SAT. 1.O, Superscalar Suite 1.O, Superscalar
Pennsylvania, May 1995. Suite 1.Oa, Available from: http://www.ece.cmu.edu/-mvelev
[6] Kunz, W, and Sotoffel, D., Reasoning in Boolean Networks, [12] Velev, M. and Bryant, R., “Effective Use of Boolean
Kluwer Academic Publishers, 1997. SatisfiabilityProcedures in the Formal Verification of
[7] Marques-Silva, J.P., “The Impact of Branching Heuristics in Superscalar and VLIW Microprocessors,” In Proceedings of
Propositional SatisfiabilityAlgorithms,” Proceedings of the 9th the Design Automation Conference, 2001.
Portuguese Conference on Artificial Intelligence (EPIA), [ 131 Zhang, H., “SATO: An efficient propositional prover,”
September 1999. Proceedings of the International Conference on Automated
[8] Marques-Silva, J. P., and Sakallah, K. A., “GRASP: A Search Deduction, pages 272-275, July 1997.
Algorithm for Propositional Satisfiability,” IEEE Transactions
on Computers, vol. 48,506-521, 1999.
534
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on September 21,2023 at 23:19:23 UTC from IEEE Xplore. Restrictions apply.
RASP options ( A ) :
+T100 + B 1 0 0 0 0 0 0 0 + C 1 0 0 0 0 0 0 0 + S l O O O O
+VO +g40 +rt4 +dMSMM +dr5
RASP options (B):
+T100 +B10000000 + C 1 0 0 0 0 0 0 0 + S l O O O O
+g20 +rt4 +dDLIS
;AT0 options ( A ) : -glOO
;AT0 options (B): [default]
:haff options (A): cherry.smj config
:haff options ( B ) : cherry.smj config
plus maxLitsForConfDriven = 10
ort timeout was 1 0 0 0 s for these sets, except for &'ed sets
where it was 100s.
535
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on September 21,2023 at 23:19:23 UTC from IEEE Xplore. Restrictions apply.