0% found this document useful (0 votes)
111 views43 pages

Simplicity, Truth, and The Unending Game of Science

This document presents a new explanation for how preferring simpler theories that are compatible with evidence can help scientists find the true answer to a scientific question. The explanation models science as an infinite game between science and nature. It demonstrates that methods that converge on the truth while minimizing reversals of earlier conclusions are those that prefer simpler theories. This preference for simplicity is justified as it allows scientists to find the truth using the minimum number of changes to their conclusions over time. Previous explanations for preferring simplicity, such as appeals to God or logic, are examined and found to be circular or dependent on description rather than capturing a connection to truth.

Uploaded by

rarrrrgh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views43 pages

Simplicity, Truth, and The Unending Game of Science

This document presents a new explanation for how preferring simpler theories that are compatible with evidence can help scientists find the true answer to a scientific question. The explanation models science as an infinite game between science and nature. It demonstrates that methods that converge on the truth while minimizing reversals of earlier conclusions are those that prefer simpler theories. This preference for simplicity is justified as it allows scientists to find the truth using the minimum number of changes to their conclusions over time. Previous explanations for preferring simplicity, such as appeals to God or logic, are examined and found to be circular or dependent on description rather than capturing a connection to truth.

Uploaded by

rarrrrgh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Simplicity, Truth, and the Unending Game of Science

Kevin T. Kelly Department of Philosophy Carnegie Mellon University [email protected] April 11, 2005

Abstract This paper presents a new explanation of how preferring the simplest theory compatible with experience assists one in nding the true answer to a scientic question when the answers are theories or models. Science is portrayed as an innite game between science and nature. Simplicity is a structural invariant reecting sequences of theory choices nature could force the scientist to produce. It is demonstrated that among the methods that converge to the truth in an empirical problem, the ones that do so with a minimum number of reversals of opinion prior to convergence are exactly the ones that prefer simple theories. The idea explains not only simplicity tastes in model selection, but aspects of theory testing and the unwillingness of natural science to break symmetries without a reason.

0.1

Introduction

In natural science, one typically faces a situation in which several (or even innitely many) available theories are compatible with experience. Standard practice is to choose the simplest theory among them and to cite Ockhams razor as the excuse (gure 1). Simplicity is understood in a variety of ways in dierent contexts. For example,

compatible with experience simple T0 T1 T2

sw i sh

T3

T4

comple x

Amazing, Ockham! How do you do it?

Figure 1: Ockham to the rescue. simpler theories are supposed to posit fewer entities or causes (Ockhams original formulation), to have fewer adjustable parameters, to be more unied and elegant, to posit more uniformity or symmetry in nature, to provide stronger explanations, or to be more strongly cross-tested by the available data. But in what sense is Ockhams razor truly an excuse? For if you already know that the truth is simple, you dont need a special inductive method like Ockhams razor to justify your choice (gure 2). And if you dont already know that the truth

compatible with experience

sw i sh !

simple

T0

T1

T2

T3

T4

comple x

Who needs you?

Figure 2: A Platonic Dilemma: Case I. is simple, the truth might be complex, so what justies you in concluding that it is simple (gure 3)? How could a xed bias toward simplicity indicate the possibly complex truth any better than a broken thermometer that always reads zero can indicate the temperature? You dont have to be a card-carrying skeptic to wonder what the tacit connection between simplicity and truth-nding could possibly be. This essay explains the connection between simplicity and truth by modelling in1

compatible with experience simple T0 T1 T2

Why not this one?

Figure 3: A Platonic Dilemma: Case II. quiry as an innite game between nature and the scientist, in which the scientists aim is to converge to the truth in a way that minimizes revisions of earlier guesses prior to convergence. But rst I review, briey, some standard explanations that fall short of the mark.

0.2

Some Traditional Explanations of Ockhams Razor

G. W. Von Leibniz (1951) explained Ockhams razor by means of a direct appeal to God (gure 4). Since God is omnipotent and kind (to simplicity-loving scientists, at

The best!

Zap! Truth The best!

Figure 4: Leibniz Theological Explanation. least), it follows that the actual world is the most geometrically elegant universe that could possibly produce such a rich array of eects. Hence, Ockhams razor points us straight at the truth. This explanation merely underscores the desperate nature of the question. I. Kant addressed the question in his Critique of Judgement (1988), where he discusses Ockhams razor under the rubric of design in nature. His explanation is characteristically idealistic: we construct the truth simply, so we could be no more wrong in concluding the world to be simple than King Midas could be wrong in concluding that the next thing he touches is gold (gure 5). But the world isnt that cooperative. J. S. Mill, B. Russell and many other philosophers have held that induction works by 2

sw i sh

T3

T4

comple x

Zap! Truth Make it so!

Figure 5: Kants Idealistic Explanation. means of a unformity of nature postulate that makes inductive inferences deductive. But any principle can be called a postulate. The question is why it should be. Many philosophers have observed that simple theories have various virtues, most notably, that simpler or more unied theories are more thoroughly tested by a given evidence set (e.g., Popper 1968, Glymour 1981, Friedman 1983). For if a theory has many free parameters (ways of being true) then new evidence simply sets the parameters and there is no risk of the theory itself being refuted altogether. But a simple theory does carry the risk of being refuted. It seems only fair to pin a medal of valor on the simple theory for surviving its self-imposed ordeal. But the question is truth, not valor, and the true theory might not be simple, in which case it wouldnt be valorous. To assume otherwise amounts to wishful thinking the epistemic sin of concluding that the truth is as pleasing (severely testable, explanatory, unied, uniform, symmetrical) as you would like it to be. Rudolf Carnap (1950) sought uniformity of nature in logic itself. This logic amounts to the imposition of greater prior probabilities on more uniform worlds, where uniformity is judged with respect to an arbitrarily selected collection of predicates. The argument goes like this (gure 6). Suppose there are but two predicates, green and

1/3 1/3 a b

1/3 1/6 1/6 a a b b

1/3 1/3 a b Uniformity is more probable!

Figure 6: Carnaps Logical Explanation. blue and that everything is either green or blue. Suppose there are two observable objects, a and b. Two worlds are isomorphic just in case a one-to-one substitution of names takes you from one world to the other in a way that preserves the basic predicates in your language. Hence the uniform world in which a and b are both green is in its own isomorphism class, as is the uniform world in which a and b are both blue. The two non-uniform worlds in which a and b have dierent colors can each be reached 3

from the other by a one-to-one, color-preserving substitution of names, so they end up in the same isomorphism class. Now Carnap invokes the principle of indierence to put equal probabilities of one third on each of these three automorphism classes and invokes it again to split the one third probability on the non-uniform class over the two non-uniform worlds. The resulting probability distribution is then biased so that uniform worlds get probability one third and non-uniform worlds get probability one sixth. So uniform worlds are more probable than non-uniform worlds (in this tiny example by a factor of two, but the advantage increases as observable individuals are added). Nelson Goodman (1950) objected that whatever is logical ought to be preserved under translation and that Carnaps uniformity bias based on languistic syntax isnt. For uniformly green and uniformly blue experience are uniform. But one can translate green and blue into grue and bleen, where grue means green if a and blue if b and bleen means blue if a and green if b (gure 7). Then in the grue/bleen

1/3 1/3 a b uniformly grue

1/3 1/6 1/6 a a b b

1/3 1/3 a b uniformly bleen Simplicity is a matter of description?

Figure 7: Goodmans Grue Argument. language, the worlds that used to be nonuniform are now uniformly grue or uniformly bleen, respectively and the worlds that used to be uniform are non-uniform, for green means grue if a and bleen if b and blue means bleen if a and grue if b. Since logical inferences are based entirely on syntax and syntactically the situation between green/blue and grue/bleen is entirely symmetrical, uniformity cannot be a feature of logical syntax. The moral is that Carnaps story makes uniformity of nature a mere matter of description. But a guide to truth could not be a mere matter of description, since truth doesnt depend upon how it is described.

0.3

Statistical Explanations

Statistical and machine learning methods embody versions of Ockhams razor which one would expect to be justied by improved arguments. Lets check. A major player in the scientic methodology business today is Bayesian methodology. The basic idea is to allow personal biases to enter into statistical inferences, where personal bias is represented as a prior probability measure over possibilities. The prior probability of hypothesis H is then combined with experience Et available at t via Bayes theorem 4

to produce an updated probability of H at t , which represents your updated opinion concerning H: Pt (H) Pt (Et | H) Pt+1 (H) = Pt (H | Et ) = Pt (Et ). It is clear from the formula that your prior opinion Pt (H) is a factor in your posterior opinion Pt+1 (H), so that the simplest theory compatible with the new data ends up being most probable in the updated probabilities. But that is simply how a Bayesian agent uses or implements Ockhams razor (gure 8). It doesnt begin to explain why

I assume simplicity!

So I assume simplicity!

Figure 8: The Circular Bayesian Explanation. you or any other Bayesian agent should implement Ockhams razor as opposed to any other prior bias, which would be equally justied by the circular Bayesian story (its the bias of the person who happens to have it). A more interesting Bayesian explanation goes like this. True, it begs the question simply to impose a prior bias toward simple theories, so lets be fair and impose equal prior probabilities on competing theories, be they simple or complex. Now suppose for concreteness that we have just two theories, simple theory S and complex theory C() with free parameter which (again for concreteness) can be set to any value from 1 to k (gure 9). Suppose, further, that S consistently entails Et , as does C(1), but that for all other parameter values i, C(i) is refuted by Et . Thus, Pt (Et | S) = Pt (Et | C(1)) = 1 but for all i distinct from 1, Pt (Et | C(i)) = 0. Suppose, again, that you have no prior idea which parameter value of C(i) would be the case if C() were true (thats what it means for the parameter to be free). So Pt ( | C(i)) is uniform.1 Turning the crank on Bayes theorem, one obtains Pt (S | Et )/Pt (C | Et ) = k. So even though the complex theory could save the data just as well as the simple one, the simple theory that did so without any ad hoc ddling ends up being conrmed much more sharply by the same data Et (e.g., Rosenkrantz 1983). Surely that explains how severe testability is a mark of truth, for doesnt the more testable theory end up more probable after a fair contest?
This is a discrete version of the typical restrictions on prior probability in Bayesian model selection described in (Wasserman 2000).
1

fairness to both theories 1/2 C(1) 1/6 C(2) 1/6 C(3) 1/6 complex simple
Figure 9: The Miracle Explanation. One must beware when Bayesians speak of fairness, for probabilistic fairness between blue and non-blue implies a strong bias toward blue in a choice among blue, yellow, red. That is all the more true in the present case: fairness between S and C induces a strong bias for S with respect to C(1), . . . , C(k). One could just as well insist upon fairness at the level of parameter settings rather than at the level of theories (gure 10). In that case, one would have to impose equal probabilities of

1/2 It would be a miracle if the parameter were set precisely to 1.

fairness to worlds 3/4 C(1) 1/4 C(2) 1/4 C(3) 1/4 complex simple 1/4 S Now thats fair.

Figure 10: The Miracle Reversed. 1/(k + 1) over the k + 1 possibilities {S, C(1), . . . , C(k)}. Now C(0), and hence C, will remain forever at least as a probable as S in light of evidence agreeing with S. Classical statisticians explain Ockhams razor in terms of overtting (cf. Wasserman 2003). Overtting occurs when you want to estimate a sampling distribution by setting the free parameters in some statistical model. In that case, the expected squared predictive error of the estimated model will be higher if the model employed is too complex (e.g., Akaike 1973, Forster and Sober 1994). This is a kind of objective, short-run connection between simplicity and truth-nding, but it doesnt really 6

address the question at hand, which is how Ockhams razor helps you nd the true model, which is quite another matter from which model to use to estimate the underlying sampling distribution. The quickest way to see why is this: suppose God were to tell you that the true model has fty free parameters. Then on a small sample, the overtting argument would still urge you to use a much smaller model for estimation and prediction purposes (gure 11). So the argument would lead to the rejection of

Behold the Truth, little human! C(1, . . . , 48) S(1)

Thanks, God, but this simplistic model will still predict better.

Figure 11: The Overtting Explanation. the true model when it is handed to you; a practice that couldnt be directed toward nding the true model. Part of the story is that the respective senses of error are dierent. Getting close to the underlying sampling distribution might not get you close to the form of the true model since distributions arbitrarily close to the true distribution could be generated by models arbitrarily far from the true model.2 Thus, distance from the theoretical truth is typically discontinuous with distance from the true sampling distribution, so minimizing the latter distance may fail to get you close in the former, as in the case of God informing you that the true model is very complex. Another point about overtting is that even to the extent that it does explain the role of simplicity in statistical prediction, it is tied essentially to examples in which the data are stochastically generated, leaving one to wonder why simplicity should have any role in deterministic inference problems, where it still feels like a good idea. Finally, there are many theorems of the sort that some method equipped with a prior bias toward simplicity is guaranteed to converge in some sense to the true model as experience (or sample size) increases (cf. Wasserman 2003). That would indeed link Ockhams razor with truth-nding if it could be shown that other possible biases dont converge to the truth. But they do. The logic of convergence results is not that Ockhams advice points at or indicates the truth, but that it is washed out or swamped, eventually, by accumulating experience, even if the advice is so misleading as to throw you o the track for a long time (gure 12). But lots of alternative biases will also get out of the way eventually, so thats hardly a ringing endorsement of Ockhams razor. What is required is an argument that Ockhams razor is, in some sense, the best possible bias for nding the true theory.
This is particularly true when the features of the model have counterfactual import beyond prediction of the actual sampling distribution, as in causal inference (Spirtes et al. 2000).
2

How is this flat tire helping me to get home?

Because you can fix it eventually.

Fssssss!
Figure 12: The Convergence Explanation.

0.4

Post Mortem

To recapitulate, the standard explanations of the mysterious relationship between simplicity and theoretical truth are either circular, wishful, or irrelevant. Still, they provide useful information about how it cant be explained. For indication of the truth would be a strong and unique connection between simplicity and truth, but it is too strong a connection to establish without begging the question at the outset, as Leibniz and the Bayesians do. On the other hand, mere convergence in the limit to the truth can be established a priori without circular premises but, alas, it is too weak to single out simplicity as the right bias to have. The crux of the puzzle, then, is to come up with a notion of helping to nd the truth that is strong enough to single out simplicity as the right bias to have but that is not so strong as to demand a question-begging appeal to Ockhams razor at the outset in order to establish it. Thus, the appropriate notion of help must be stronger than convergence in the limit and weaker than indication in the short run. The account to be developed below steers between these two extremes by considering a rened concept of convergence, namely, convergence with a mininum number of reversals of course prior to arrival at the goal (gure 13). This is stronger than mere

Simple

Complex

Simple

Complex

Simple

Complex

indication too strong

convergence too weak

straightest convergence just right?

Figure 13: Three Kinds of Help. convergence in the limit (which says nothing about minimizing kinks in the path) and is weaker than indication (which requires that the advice point straight at the goal immediately, wherever the goal happens to be). It will be demonstrated that an ongoing 8

bias toward simplicity minimizes kinks in your course to the truth in a certain precise sense. But rst I illustrate the approach by showing that something similar happens almost every time you ask for directions.

0.5

Asking for Directions

Suppose you are headed home on a road trip and get lost in a small town. In frustration, you stop to ask a local resident how to get home (gure 14). Before you can even say

Which way to...? Go back two blocks. The freeway is on the right.

Figure 14: Asking For Directions. where you are headed, he gives you the usual sort of advice: directions to the nearby highway entrance ramp, which happens to be a few blocks back in the direction you just came from. Now suppose that, in a t of hubris, you disregard the residents advice in favor of some intuitive feeling that your home is straight ahead (gure 15). That ends

How could he answer without knowing our destination? The sun was on the right, so Pittsburgh must be straight ahead.

Figure 15: Hubris! up being a bad idea (gure 16). You leave town on a small rural route that twists and 9

Pittsburgh

Told ya!

needless U-turn

your extra detour

advised route

Figure 16: The U-turn Argument. bounces into the wilderness and that ends up winding its wild way over the Allegheny mountains. At that point you concede the error of your ways and turn around to nally heed the local residents advice. Thereafter, everything goes straightforwardly. The entrance ramp to the freeway was indeed just a few blocks away and the freeway provides as straight a route home as is feasible in mountainous country. As you speed your way uneventfully homeward you contemplate: if only you hadnt ignored the local residents advice, you wouldnt have added that useless, initial U-turn to your otherwise direct trip home. Lets take stock of a few of the striking features of this mundane story. First, the local residents advice was indeed helpful, since it would have put you on the straightest possible path home. Second, by disregarding the advice you incurred an extra U-turn or kink in your route. What is particularly vexing about the initial U-turn is that it occurs even before you properly begin your journey. Its a sort of navigational original sin that you can never be absolved of. Third, the resident didnt need to know where you were going in advance in order to give you helpful advice. Any slicker asking for directions in a town hundreds of miles from the nearest city would do best to get on the freeway. Hence, the residents ability to provide useful information without knowing where your home is doesnt require an occult or circular explanation. Suppose, on the other hand, that the resident could give you a compass course home before knowing where you are headed. That would require either a circular or an occult explanation (an Ouija board or divining rod). Fourth, even the freeway is not perfectly straight, so the residents advice provides no guarantee against future course reversals, even though it is the best possible advice. Finally, the residents advice is the best possible advice

10

even though it points you away from your goal at the outset. If help required that you be aimed in the right direction, then the resident would have to give you a compass heading home, which wouldnt be possible unless he already knew where your goal was or had an Ouija board or divining rod. So the typical situation in which you ask for directions home from a distant small town has all the fundamental features that an adequate explanation of the truth-nding ecacy of Ockhams razor must have. Perhaps Ockham also provides xed advice that puts you on the best possible route to the truth without pointing you at the truth and without guarantees against future course reversals along the way. It remains to explain what the freeway to the truth is and how Ockhams advice leads you to it.

0.6

The Freeway to the Truth

Even English lexicography suggests the essential connection between freeways and truth-nding, for both changes in course and changes in opinion are called changes in attitude. According to this analogy, Ockhams advice should somehow minimize changes of opinion prior to convergence to the right answer.3 Lets consider how the story goes in the case of a very, very simple truth-nding problem. Suppose there is an emitter of discrete, readily detectable particles at arbitrary intervals and we know that it can emit at most nitely many particles altogether (gure 17). The question is how many particles it will ever emit. What makes the problem

?! Burp!

Figure 17: Counting Particles (Eects). interesting is that an arbitrarily long interval without new particles can easily be mistaken as total absence of future particles. This problem has more general signicance than might rst be apparent, for think of the particles as detectable eects that may be arbitrarily hard to detect. Typically, a theory with extra free parameters will imply extra eects tied to the parameters, but tuning the parameters toward zero makes the extra eects arbitrarily small and, therefore, arbitrarily hard to detect so that they are
The idea of counting mind-changes already appears in (Putnam 1965). Since then, the idea has been studied extensively by computer scientists interested in computational learning (cf. Jain et al. 1999 for a review). The focus, however, is on categorizing the complexities of problems rather than on singling out Ockhams razor as an optimal method. Oliver Schulte and I began looking at retraction minimization as a way to severely constrain ones choice of hypothesis in the short run in 1996 (cf. Schulte 1999a, 1999b). Schulte has also applied the idea to the inference of conservation laws in particle physics (Schulte 2001). The ideas in this essay build upon and substantially simplify and generalize the initial approach taken in (Kelly 2002), (Kelly and Glymour 2004) and (Kelly 2004).
3

11

detected arbitrarily late. For example, in curve tting, the curvature of a quadratic curve may be so slight that it requires a huge amount of data to notice that the curve is non-linear.4 So the theory that the curve is quadratic but not linear predicts the eventual detection of eects that would never appear under the linear theory. Similarly, the curvature of a cubic curve may be so slight that it is arbitrarily hard to distinguish from a quadratic curve. The point generalizes to typical model selection settings regardless of the interpretation of the parameters. So deciding among models or theories with dierent free parameters is quite similar, after all, to counting particles. Ockhams original formulation of Ockhams razor is to not multiply entities without necessity. It is necessary (on pain of outright inconsistency) to assume as many particles as you have seen, but it is not necessary to assume more, so that if you conclude anything, you should conclude exactly as many particles as you have seen so far (gure 18). The most aggressive Ockham method is the counting method that simply

What is the least possible number? Burp!

Figure 18: Ockham in Action. concludes that every particle has been seen at every stage. Other Ockham methods may, more realistically, suer from an arbitrarily long period of doubt at the outset and after each mind-change prior to building up condence in the next Ockham answer. Ockhams razor, itself, says nothing about how long this condence-building time should last and the following argument for Ockhams razor doesnt imply anything about how long it should be either. It simply requires you to adopt some Ockham method, whether the method waits or not. That is as it should be, since even believers in short-run evidential support (e.g. Carnap and the Bayesians) allow for arbitrary individual dierences concerning the time required for condence buildup. Other intuitive formulations of empirical simplicity conspire to the view that the Ockham answer should be the exact count. First, the Ockham theory that there are no more particles than you have seen is the most uniform theory compatible with experience, for it posits a uniformly particle-free future. Second, the Ockham theory is the most testable theory compatible with experience, for if it is false, you will see another particle and it will be decisively refuted. Any theory that anticipates more particles than have been seen might be false because there are fewer particles than anticipated,
It is assumed that he data are increasingly precise but inexact; else three points would settle the question (Popper 1968). The same point holds if the data are noisy. In that case, tuning the parameters toward zero makes the eects statistically undetectable at small sample sizes (cf. Kelly and Glymour 2004, Kelly 2004) for the translation to stochastic problems.
4

12

in which case it will never be refuted decisively since the anticipated particles might always appear later. Third, the Ockham theory is most explanatory, since the theory that posits extra particles fails to explain the times at which those particle appear. The theory that there are no more particles fails to posit extra, unexplained times of appearance. Fourth, the Ockham theory is most symmetrical, since the particle-free future is preserved under permutation of times, whereas a future punctuated by new particle appearances would be altered by such permutations. Fifth, the Ockham theory has the fewest free parameters, because each time of appearance of a new particle is a free parameter in a theory that posits extra particles. So in spite of its apparent triviality, the problem of counting things that are emitted from a box does illustrate a wide range of intuitive aspects of empirical simplicity. That isnt so surprising in light of the analogy between particles and empirical eects tied to free parameters. If you follow any Ockham solution to the particle-counting problem, then you change your mind in light of increasing data at most once per particle. If the true count is k, then you change your mind at most k times. By way of comparison, suppose that you have a hankering to violate Ockhams razor by producing a dierent answer (gure 19). You might reason as follows. The particle emitter has overturned every successive

Figure 19: Ockham Violation! Ockham answer in the past (i.e., zero, one, two, and three), so you expect it will overturn the current Ockham answer four as well. So by induction on Ockhams unbroken losing streak in the past, you anticipate failure again and guess ve (or some greater number of your choosing) rather than the Ockham answer four. Philosophers of science call this the negative induction from the history of science (Laudan 1981). Why side with Ockham rather than with the negative induction against him? Think of the game of inquiry as starting from scratch at the moment you rst say ve. I am not concerned about what you did in the past. Eciency is future-directed and is unencumbered by debts or slush-funds accumulated in the past. Accordingly, the subproblem entered when you rst say ve consists of the restriction of possibilities to those consistent with current experience. Furthermore, only mind-changes incurred after entering the subproblem are counted in that subproblem. You start the subproblem with a clean slate but all future mind-changes are charged against you. So consider the subproblem entered when you rst say ve upon having seen only four particles. There is no deadline by which the fth particle you anticipate has to show up, so you may have to wait a long time for it even if you are right. You wait and wait and wait (gure 20). Your graduate students exhaustively examine the machine

13

Figure 20: The Pressure Builds. for possible malfunctions. Colleagues start talking about the accumulation of null results and refer to the failure of appearance an anomaly. True, the posited particle could appear (to your everlasting fame) at any time, so your theory isnt strictly refuted. Nonetheless, you feel increasing pressure to switch to the four-particle theory as the anomaly evolves subjectively into a crisis. This increasing pressure comes not from the weight of the evidence as philosophers are wont to say, but from your strategic aim to converge to the truth whatever nature throws at you. For if you never change your mind from ve to four and the fth particle never appears, you will converge for eternity to ve when the truth is four. So at some time of your choosing, you must (on pain of converging to the wrong answer) cave in to the pressure from natures strategic threat and switch back to the (Ockham) theory that the machine will produce just four particles (gure 21). Wont that make for interesting gossip in the Particle

Told ya!

Figure 21: The Agony of Retreat. Counting Association, where you are celebrated as the sole defender of the ve particle theory?5 To summarize, in the subproblem entered when you rst say ve, nature can force you to change your mind at least once (from ve to four) in the manner just
I am alluding, of course, to Thomas Kuhns (1962) celebrated historical theory of the structure of scientic revolutions. Null experiments generate anomalies which evolve after careful consideration into crises that ultimately result in paradigm change. Kuhn concludes, hastily, that the change is an unlawful matter of politics that has little to do with nding the truth. I respond that it is a necessary consequence of the logic of convergence to the truth after a violation of Ockhams razor, as will become clear in what follows. Many of the celebrated scientic revolutions in physics have been the results of Ockham violations (e.g., Ptolemy vs. Copernicus, Fresnel vs. Newton, Einstein vs. Newton, and Darwin vs. creationism). In each of these cases, a theory positing extra free parameters (with attendent empirical eects) was chosen rst and a simpler theory was thought of later and came to replace the former, often after an accumulation of null experiments.
5

14

described without presenting a fourth particle. The same is not true of Ockham, who enters the same subproblem saying four (or nothing at all) and who never changes his mind until the next particle appears. Thereafter, Ockham changes his mind exactly one time per extra particle. But you can be forced by nature to change your mind at least once per extra particle (on pain of not converging to the truth) in the same way already described; for a long period during which there are exactly i particles forces you to say i on pain of not converging to the truth, after which nature can present the i + 1th particle, etc (gure 22). Hence, if a solution violates Ockhams razor in the

initial U-turn 5 4 5 6 ... 4 5 6 ... Burp!

Figure 22: Ockham Avoids Your Initial U-turn. particle counting problem, then in the subproblem entered at the time of the violation, whatever sequence of outputs the Ockham solution produces, the violator can be forced by nature to produce a sequence including at least the same mind-changes plus another one (the initial U-turn from ve to four). You should have listened to Ockham! The same argument works if you violate Ockhams razor in the other possible way, by saying three when four particles have been seen. For nature can refuse to present more particles until you change your mind to four on pain of never converging to the right answer if the right answer is four. But in the same subproblem, Ockham would already have said four if he said anything at all, and in either case you can be forced into an extra mind-change in each answer. So the U-turn argument also explains the need for maintaining consistency with the data. So there is, after all, a fairly tight analogy between particle counting and getting on the freeway. Your initial mind change from ve to four is analogous to your initial U-turn back to the local residents house en route to the highway. Thereafter, no matter what the true answer is, you can be forced to change your mind at least once for each succssive particle, whereas Ockham changes his mind at most once per successive particle. These mind-changes are analogous to the unavoidable curves and bends in the freeway. So no matter what the truth is, you start with a U-turn Ockham avoids and can be forced into every mind-change Ockham performs thereafter. As in the freeway example, you have botched the problem before you even properly get started. In both stories, the advice is the best possible. Nonetheless, it does not impose a bound on future course reversals; nor does it point you toward your goal by some occult, unexplained mechanism. A striking feature of the explanation is that it is entirely game-theoretic. There is no primitive notion of support or conrmation by data of the sort that characterizes 15

much of the philosophical literature on induction and theory choice (gure 23).6 Nor

o Supp

rt

Figure 23: Pulling the Rug Out. are there prior probabilities that foster the illusion of support of general conclusions by a few observations. The phenomenology of support by evidence emerges entirely from the aim of winning this truth-nding game against nature. Furthermore, the game is essentially innite. For if there were an a priori bound on the time by which the next particle would arrive if it arrives at all, then you could simply out-wait nature and avoid changing your mind altogether. So the argument is situated squarely within the theory of innite zero-sum games, which is the topic of this volume. Here is why the reference to subproblems is essential to the U-turn argument. Suppose that you are asleep when you see the rst particle and that when you see the second particle you wake up and guess three, expecting that you will also sleep through the third (gure 24). Thereafter, you always agree with Ockham. If the third

subproblem 3 2 ... 2 2 ...

Figure 24: Ockham Still Wins in Subproblem particle doesnt appear right away you can be forced to change your mind to two, but thats only your second retraction Ockham wouldnt have done better. Now that you have caught up with Ockham, you match him no matter how many particles you see in the future. But that is only because you saved a retraction in the past by sleeping through the rst particle. That is like hoarding a slush fund to hide future mismanagement from the public. In the subproblem entered when you say three, the slush fund is emptied and you have to demonstrate your eciency from scratch. In that subproblem, your rst reversal of opinion back to two gets added to all your later mind-changes and you never catch up, so Ockham wins. The moral: an arbitrary
In this respect, my approach is a generalization and justication of the anti-inductivism of K. Popper (1968).
6

16

Ockham solution beats you in the subproblem in which you violate Ockhams razor, but the Ockham solution does as well as you in every subproblem, so the Ockham solution is better. Sometimes I encounter the purported objection that you could always weakly dominate7 the mind-changes of a given Ockham solution by suspending judgment for a long time before selecting any answer. Indeed, every method is weakly dominated in mindchanges by a clone who suspends judgment for a longer time than the given method and that then starts producing the same answers, so there is no question of any Ockham method being admissible (weakly undominated). More to the point: in the best case for the violator, the anticipated fth particle might appear immediately after the violation, before the method even has a chance to become queasy about over-estimating (gure 25). In that case, the violators output sequence in the subproblem entered at

Looking for this? Burp!

5 5 ... 4 5...

Figure 25: Best Case Fairy to the Rescue the violation begins with ve, ve, whereas Ockhams output sequence in the same subproblem begins with four, ve, which is worse. Hence, the Ockham method doesnt weakly dominate the violators mind-changes in the subproblem in question. But that merely explains why the U-turn argument, which does establish the superiority of an arbitrary Ockham solution over an arbitrary non-Ockham solution, is not a weak dominance argument. The U-turn argument essentially involves a worst-case dimension lacking in weak dominance, for nature can force the non-Ockham solution from ve back to four (on pain of convergence to the wrong answer) by withholding particles long enough and can then reveal another particle to make it say ve, four, ve, which is never produced by any Ockham method and which properly extends the Ockham sequence four, ve.8 Nor, for that matter, is the U-turn argument a standard worst-case or minimax argument, for there is no xed bound on mind-changes for any solution to the counting problem (nature can force an arbitrary method through any number of mind-changes). That may help to explain why the explanation has remained elusive.
In the sense of being as good in all worlds and better in some. The fact that among the Ockham methods, delayers weakly dominate non-delayers in terms of mind-changes provides an explanation why it is intuitive to delay for a while. But you must choose one of these weakly dominated methods, else you dont converge to the truth at all, so neither the proposed argument nor the weak dominance argument explains which weakly dominated solution you choose. Again, this accords with individual dierences about when to make the leap from moping skepticism after a surprise to the next Ockham hypothesis.
8 7

17

0.7

A General Conception of Scientic Problems

A scientic problem species a set of possible worlds the scientist must succeed in together with a question Q which partitions into mutually exclusive potential answers. The aim is to nd the true answer for w no matter which world w in you happen to live in. If Q is a binary partition, one thinks of a decision or test problem for one of the two cells vs. the other. If it is a xed range of alternatives extensionally laid out in advance, one speaks of theory choice. If it is an innite partition specied only by some latent criterion determining the kind of theory that would count as success, the situation might be described as discovering the truth. The most characteristic thing about empirical science is that you dont get to see w in its entirety. Instead, you get some incomplete evidence or information about w, represented by some subset of containing w. The set of all possible information states you might nd yourself in is modelled as the collection of open sets V in a topological space over . A scientic problem is just a triple (, V, Q), where (, V) is a topological space and Q partitions (gure 26). The idea is that although the scientist never gets

world in

answer to Q true in w

information state in V encountered in w


Figure 26: A Scientic Problem. to see the actual world w itself, he does get to see ever smaller open neighborhoods of w. The use of topology to model information states is not a mere stipulation, for information concerns veriable eects and topology is perhaps best understood as the mathematical theory of veriability.9 The point is seen most directly as follows. Identify each proposition with the set of possible worlds or circumstances in which it would be true, so propositions may be modelled as subsets of the set of possible worlds. Say that a proposition is veriable if and only if there exists a method or procedure that examines experience and that eventually illuminates a light if the proposition is true and that never illuminates the light otherwise. For example, illuminating the light
9 This may sound odd to geometers, but students of intuitionistic logic will be less surprised, since intuitionism is motivated by a notion of formal veriability and also has a topological semantics. Also, students of computability will recall that topology shows up in the Rice-Shapiro theorem, which characterizes computable veriability (i.e. recursive enumerability). Furthermore, topology is used to model partial informaton states in denotational semantics (Scott 1982).

18

when a particle appears yields a verication procedure for the proposition that at least one particle will appear. The contradiction is the empty set of worlds (it cant possibly be true) (gure 27). It is veriable by the trivial verication procedure that never illuminates its light.

Contradiction

Tautology Finite conjunction Arbitrary disjunction

...
Figure 27: Veriable Propositions are Open Sets.

Similarly, the tautologous proposition consists of the whole set of worlds and is veriable by the trivial procedure that turns on its light a priori. Suppose that two veriable propositions A, B are given. Their conjunction A B is veriable by the procedure that turns on its light if and only if the respective verication procedures for A and for B have both turned on their lights. Finally, suppose a collection D of veriable propositions is given. Their disjunction D is veriable by the procedure that turns on its light just in case the procedure for some proposition A D turns on its light (you will see that light eventually as long as each respective procedure is only a nite distance away). Hence, the veriable propositions V over constitute the open sets of a topological space (, V). So every theorem about open sets in a topological space is also true of ideal empirical veriability. One of the most characteristic features of topology is that open sets are closed under arbitrary union but only under nite intersection. That is also explainable in terms of veriability. Suppose you are given an innite collection C of veriable propositions. Is there a verication procedure for C? Not always. For the respective verication procedures for the elements of C may all turn on their lights, but at dierent times, so that there is no time by which you can be sure that it is safe to to turn on your light for C (gure 28). That is an instance of the classical problem of induction: no matter how many lights you have seen go o, the next one might never do so. So not only are the axioms of topology satised by empirical veriability; the characteristic asymmetry in the axioms reects the problem of induction. In a given topological space (, V), the problem of induction arises in a world w with respect to proposition H just in case every information state (open proposition) true in w is compatible both with H and with H (gure 29).10 In standard, topological
10

Of course, H is understood to be H.

19

Gotta decide sometime!

...

on

Arbitrary conjunction

Figure 28: The Demon of Arbitrary Conjunction.

not-H

...

Figure 29: Demons Live in Boundaries. parlance, the problem of induction arises with respect to H in w just in case w is a boundary point of H. So the demons of induction live in the boundaries of propositions one would like to know the truth values of. In a world that is an interior point of H, one eventually receives information verifying H (since an interior point of H has a neighborhood contained in H). Hence, not every veried proposition is veriable, since a veried proposition merely has non-empty interior, whereas a veriable proposition is open. But if a non-veriable proposition is veried, some open proposition entailing it is veried, so information states can still be identied with open sets. Less abstractly, recall the particle-counting problem. A possible world determines how many particles emerge from the machine for eternity and when each such particle emerges. Thus, one may model possible worlds as -sequences of bits, where 1 in position n indicates appearance of a new particle at stage n and 0 indicates that no new particle appears at n. The information available by stage n is then a nite bit string (b0 , . . . , bn1 ). But that isnt a proposition. The corresponding proposition is the set of all -sequences of bits that extend the nite bit string observed so far. Call this proposition the fan with handle (b0 , . . . , bn1 ), since all the worlds satisfying the fan agree up to n and then fan out in all possible ways from n onward (gure 30). Any disjunction of veriable events is veriable (see above), so any union of fans is also veriable, and hence open (just wait for the handle of one of the fans to appear before turning on the light). The resulting space over arbitrary, -sequences of bits has been very heavily studied in topology, where it is known as as the Cantor space. In the particle-counting problem it is assumed that at most nitely many particles will appear, so one must restrict Cantor space down to the -sequences that converge to 0.

20

...
Observed

Possible

Figure 30: A Fan of Sequential Worlds. Consider the proposition that exactly two particles will be observed for eternity. This proposition is impossible to verify (no matter what you see, more than two particles may appear later). Hence, its interior is empty and every element is a boundary point, where the problem of induction arises. In this space, the boundary points are particularly suggestive of the problem of induction (gure 31). For example, consider

not-H

...
H
Figure 31: Boundary Points in Cantor Space. the world (1, 1, 0, . . .) where the dots indicate an innite tail of zeros. No matter how far you travel down this sequence (i.e., no matter what information state true of the world you are in), there exist worlds in which more than two particles appear later than you have observed so far. So nature is in a position to drag you down the sequence (1, 1, 0, . . .) until you cave in and say two and is still free to show you another particle, as in the U-turn argument. So the U-turn argument hinges on the topologically invariant structure of boundary points between answers to a question.

0.8

The Unending Game of Science

Each scientic problem determines an innite, zero-sum game of perfect information (cf. Kechris 1991) between the scientist, who responds to each information state by selecting an answer (or by refusing to choose), and the impish inductive demon, who 21

responds to the scientists current guess history with a new information state. The demon is not a real entity in nature; he merely personies the diculty of the challenge the scientist poses for himself by addressing a given scientic problem. In this truth-nding game, the demon and the scientist take turns, starting with the scientist (gure 32). Together, the demon and the scientist produce a pair of -

A3 A1 E E A2 A3 A4

Figure 32: The Players. sequences, an information sequence produced by the demon and an answer sequence produced by the scientist. Life would be too easy for the demon if he were allowed to withhold some crucial information for eternity, so the scientist is the victor by default if the demon fails to present complete, true information about some world in in the limit.11 In other words, the information sequence {Ei : i } presented by the demon should be nested downward and should have the completeness property that there exists w such that for each open neighborhood S of w, there exists i such that Ei S.12 If the demon fullls his informational obligations, the scientist wins only if he stabilizes, eventually, to the answer true in some world w the demon presents true information for. In other words, there must exist a stage in the play sequence after which the scientists answer is correct of w. A winning strategy for the scientist in the truthnding game is called a solution to the underlying empirical problem. For example, the obvious counting strategy solves the particle-counting problem.13
11 One might reply that if it is impossible for the demon to full his duty, the scientist loses since even the total experience received in the limit of inquiry doesnt settle the question. The game could be set up to reect either viewpoint. 12 The demon can accomplish this feat for an arbitrary world if, for example, the underlying topology is separable (has a countable basis). For in that case the demon can choose a world w and can enumerate all the basis elements containing w as {Bi : i }. Then at stage i he can present Ei = ji Bj . Since each open neighborhood S of w is a countable union of basis elements, there exists j such that Bj S, so for all i j, Ei S. But there are also cases in which the demon can present complete information even though the space is not separable. For example, suppose that = 1 + 1 and that basic open sets are of the form S = { : } where 1 . This space is not separable, but the demon can perform his appointed duty in an arbitrary world 1 by deciding to leap straight to S after nite time. 13 It is interesting to inquire into the topological nature of solvability, since solvability is a topological invariant and must, therefore, be grounded in a problems topological structure. For example, if the space is separable and the question is a countable partition, then solvability is equivalent to each cell being 0 Borel (cf. Kelly 1996). Such questions are not strictly necessary for understanding Ockhams 2

22

0.9

Comparing Mind-Changes

Consider two possible sequences of answers, and . Say that maps into (written ) just in case there is an answer and order preserving mapping (not necessarily one-to-one) from positions in to positions in , where suspension of judgement is a wild-card in the former sequence that matches any answer in the latter (gure 33).14 Since the mapping preserves answers and order, it also preserves mind-changes (not

...

1 E oops!

...

Figure 33: Top Output Sequence Better Than Bottom. counting mere suspension as a mind-change). So when maps into , one may say that is as good as so far as mind-changes are concerned. Say that maps properly into (written < ) if, in addition, the latter fails to map into the former, as in gure 33. Then is better than . One can also say of two sets of output sequences that the former is as good as the latter just in case each element of the former is as good as some element of the latter (gure 34) and is better than the latter if, in addition, the latter is not as good as

Figure 34: Top Set Better Than Bottom. the former.15 The former set is strongly better than the latter just in case each of the formers elements is better than some element of the latter that is not as good as any
razor, and are therefore omitted from this essay. 14 In fact, an Ockham methods output sequences map injectively into output sequences the demon can force out of an arbitrary method. 15 This is not the same as weak dominance, since the existential quantier allows for a worst-case pairing of output sequences by the mapping.

23

element of the former (gure 35).16 Extend the symbols and < to sets of output

Figure 35: Top Set Strongly Better Than Bottom. sequences accordingly. The set of output sequences of a solution to a problem is the set of all output sequences such there exists some world in the problem the method produces . Then one can say of two methods that the former is as good, better, or strongly better just in case their respective sets of output sequences bear the corresponding relation. Finally, say that a method is ecient in a problem just in case it as good as any other method in each subproblem of the problem. Again, the idea is that ineciency is forward-looking and should not be oset by foibles or stockpiles of credit (slush funds) earned in the past. By way of illustration, the counting solution is ecient in the particle-counting problem, as is any Ockham solution to this problem (remember that Ockham solutions can suspend belief for artibrary periods of time). That is because the demon can force any solution through any ascending sequence of answers and Ockham methods produce only ascending sequences of answers. Furthermore, any non-Ockham solution is worse than any Ockham solution in the subproblem entered when the violation occurs. Indeed, it was shown that the violator is strongly worse than any Ockham solution in that subproblem because the demon can force the violator into any ascending sequence after the U-turn back to the Ockham answer. Hence, the counting problem has the remarkable property that its ecient solutions are exactly its Ockham solutions. That is surely a result worth pressing as far as possible! But rst, Ockhams razor must be extended with mathematical precision to arbitrary empirical problems.

0.10

What Simplicity Isnt

The concept of simplicity appears at rst to be a hodge-podge of considerations including uniformity of nature, theoretical unity, symmetry, testability, explanatory power, and minimization of entities, causes, and free parameters. But in spite of this manifold
16 The requirement that the sequence mapped to is not as good as any of the former methods output sequences precludes cases in which a method is strongly better than itself. For example, if there are only two answers in the particle problem, even and odd, then each output sequence of the obvious method that answers according to whether the number of observed particles is even or odd is better than some other output sequence of the same method (e.g., (E, O, E, O, . . .) < (O, E, O, E, O, . . .)).

24

appearance, it remains possible that simplicity is some deep, unied, structural concept that manifests these various aspects depending on the particular structure of the problem addressed. It is suggestive in this regard that the trivial particle-counting problem already illustrates all of the intuitive aspects of simplicity just mentioned and that they seem to cluster around the nested problems of induction posed by the repeated possibility that a new particle might appear. It is easy, at least, to say what simplicity couldnt be. It couldnt be anything xed that does not depend on the structure of the problem. For it is a commonplace in the analysis of formal procedures that dierent algorithmic approaches are ecient at solving dierent problems. So if simplicity did not somehow mold itself objectively to the structure of the particular problem addressed, Ockhams razor couldnt possibly be necessary for ecient convergence to the truth in a wide range of distinct problems possessing dierent structures. That is the trouble with concepts of simplicity like notational brevity (Li and Vitanyi 1997), uniformity of worlds (Carnap 1950), prior probabilistic biases, and historical entrenchment (Goodman 1983). Left to themselves, none of these ideas conforms to the essential structural interplay between a problems question and its underlying informational topology, so none of them could contribute objectively to truth-nding eciency over a range of dierent problems. Of course, all of them could be forced to do so by the adoption of rules for adjusting notation to reect the relevant structure. But then it is plain that brevity, uniformity of worlds, prior probability, and other proposed accounts of simplicity are just inert markers used to indicate the true, underlying structure of simplicity in the problem itself. Far better to study the genuine article directly, rather than through a veil of irrelevant formal distractions.

0.11

Simplicity and Ockhams Razor Dened

In spite of the hubris suggested by the title of this section, it must be conceded that we now have an unusual and considerable advantage: we already know what kind of justication of Ockhams razor we are after. Hence, we can solve backward for Ockhams razor by generalizing the features of particle-counting that support the Uturn argument. The key to the U-turn argument is the demons ability to force at least a given sequence of mind-changes out of an arbitrary solution. In the particle-counting problem, the demon can present information from the zero-particle world until the scientist caves in and concludes that there will be zero particles (on pain of not converging to the true answer) (gure 36). Then the demon can present a particle followed by no further particles until the scientist concludes one particle, again on pain of not converging to the true answer, and so forth. This cant go on forever, though, because the scientist must present data from some world in , and all such worlds present at most nitely many particles. Hence, for each nite ascending sequence of answers, the demon can force an arbitrary solution to the particle-counting problem into an output sequence that maps into. But the demon has no strategy for dragging an arbitrary solution

25

0, 1, 2, 3, ?, ?, ?, ? If you never say 4, youll miss the truth forever!

Figure 36: Demon Forcing a Sequence of Answers. through any non-ascending sequence, say, (1, 0). For the obvious counting method will wait to see the rst particle before concluding one and, thereafter, the demon can no longer trick it into thinking that there are no particles since the particle has already been presented. That is a fundamental asymmetry in the problem. More generally, if is a nite, non-repetitive sequence of answers, then the avoidance game for a problem is won by the scientist just in case the demon fails to present an appropriate information sequence or the scientist wins the truth-nding game and fails to produce a sequence of conjectures as bad as . The demon wins if he presents appropriate information that makes the scientist lose the truth-nding game or that somehow lures the scientist into producing an output sequence as bad as . When the demon has a winning strategy in the -avoidance game, one may say that the demon can force out of an arbitrary solution to the problem. For example, it was shown that the demon has a winning strategy in the (0, 1, 2, . . . , n)-avoidance game in the particle-counting problem, since every method can be induced to produce that output sequence (or a sequence that is at least as bad). Then say that is demonic in a problem just in case the demon can force it from an arbitrary solution to the problem. The demonic sequences in a problem reect a deep relationship between the question Q and the underlying topology V. The ability of the demon to force demonic sequence zero, one, two implies that there is a zero particle world that is a limit point of one particle worlds each of which is a limit point of two particle worlds and so forth. So demonic sequences represent iterated problems of induction within the overall problem (gure 37). According to intuition, simpler answers are associated with the most deeply embedded problems of induction, for starting from zero the demon can drag a solution through every ascending sequence, but after presenting some particles he can never drag the counting solution back to zero. That suggests an elegant concept of empirical simplicity. If A is an answer in a problem, then say that the A-sequences are the demonic sequences starting with A. Say that answer A is as simple as B just in case the A-sequences are as bad as the B-sequences and that A is simpler than B just in case the A-sequences are worse than the B-sequences. This denition agrees with intuition in the counting problem and, hence, in parameter-freeing problems of the usual sort, such as curve-tting. The proposed account has a striking intuitive advantage over the familiar idea that simplicity has something to do with Euclidean dimension or with continuous measures 26

1 2

0???1????2 2

Figure 37: Demonic Sequence in the Particle-Counting Problem. over a parameter space. For if God were to suddenly tell you that the true parameter setting is a rational number, the topological dimension of the space drops to zero and the nice-looking continuous measure becomes a skewed discrete sum that loses all respect for former distinctions of dimensionality. But the rational-valued subspace of the parameter space preserves boundary points and hence demonic sequences and hence simplicity in the proposed sense. That ts with intuition, for simplicity doesnt seem to dissolve when God delivers the news. Now say, quite naturally, that an answer is Ockham in a problem just in case it is as simple as any alternative answer. Ockhams razor is then: never say an answer unless it is Ockham for the current subproblem. Finally, a solution is Ockham if it solves the problem and always heeds Ockhams razor. The Ockham answer is typically unique, but not always. If there is no Ockham answer, you must suspend judgment. If there is more than one, choosing among them is allowed. Suspending judgment is always allowed. It may sound odd to allow an arbitrary choice among Ockham answers, but keep in mind that two hypotheses could be maximally simple (no other answers demonic sequences are worse) without being Ockham (every other answers demonic sequences map in). The truly objectionable choices turn out to be among answers of the latter sort, as will be explained below. Here is a handy, equivalent formulation of the Ockham answer concept, where denotes concatenation. The proof is immediate.17 Proposition 1 (Ockham characterization) In an arbitrary problem, A is Ockham if and only if for every demonic sequence , A maps into a demonic sequence.
Suppose that A is Ockham. Let be a demonic sequence. Since A is Ockham, maps into some demonic sequence A . Then so does A , so we have left-to-right. For right-to-left, suppose that A is not Ockham, so there exists demonic that maps into no demonic sequence of form A . But then A doesnt map into any demonic sequence of form A either.
17

...
2 2

...
2

...

...
0

27

0.12

Eciency, Games and Determinacy

Lifting the U-turn argument to the general version of Ockhams razor just dened requires a short digression into the nature of ecient solutions. A method is as good as a set of sequences just in case the methods set of output sequences is as good as the given set, and similarly for the other ordering relations dened above. Then it is immediate by denition that the demonic sequences are as good as an arbitrary ecient solution to the subproblem. It is far less trivial whether an ecient solution must be as good as the set of demonic sequences. This is where Ockhams razor interfaces with recent developments in descriptive set theory (cf. Kechris 1991 for details). Say that a game is determined just in case one player or the other has a winning strategy and that a scientic problem is determinate just in case for each nite answer sequence , the avoidance game is determined. This turns out to be a surprisingly mild restriction, since D. Martin (1983) has proved the following, remarkable theorem (cf. Kechris 1991): Proposition 2 (Borel determinacy) Every Borel game is determined. Descriptive set theory is standardly carried out on a Polish space, which is a metrizable, complete space with a countable basis. Say that a problem is typical just in case its topology is some restricted Polish space and its question has only countably many possible answers. By a standard embedding theorem (cf. Kechris 1991), a restriction of Polish space is homeomorphically embeddable into the Baire space (which is just like Cantor space except that arbitrary natural numbers may occur along the sequences, so Cantor space is a restriction. Furthermore, say that a problem is Borel just in case each answer is Borel with respect to the problems underlying topology. The typical Borel problems are a rich class covering just about anything one might encounter in model selection. Then: Proposition 3 Typical Borel problems are determinate.18 Furthermore, every typical, solvable problem is Borel,19 so: Proposition 4 Typical solvable problems are determinate. The following results all concern solutions and hence are vacuously true if the problem in question is unsolvable. Hence: Proposition 5 In each of the following propositions, determinate may be replaced with typical.
Convert the typical problem into a topologically equivalent problem on a restriction of the Baire space. The demons requirement amounts to producing a nested sequence that eventually entails an entry in each position of some -sequence of natural numbers. Converging to the truth amounts to eventually producing an answer true of the element the demon species, which is a Borel condition, since the answers to a Borel problem are Borel and the existential quantication over answers is countable. Finally, it is a Borel condition for the scientist to avoid a given, nite sequence . So the winning set for the -avoidance game is a Borel set in the Baire topology. 19 Since solvability implies that each answer is 0 Borel (Kelly 1996). 2
18

28

Hence, the restriction to determinate problems is very mild. I dont substitute typical for determinate in the results themselves, since the latter concept is still more general. Returning to the question at hand, we have: Proposition 6 (Eciency Characterization) Let the problem be determinate. An arbitrary solution is ecient if and only if it is no worse than the demonic sequences in each subproblem.20 So not only is an ecient solution as good as any solution, it is as good because it is as good as the demonic sequences, which are as good as any solution since the demon can force any solution through an arbitrary demonic sequence.

0.13

Ecient Solutions = Ockham Solutions

Here is the main result. Ockham is indeed necessary and sucient for eciency in an extremely broad range of problems. The hypothesis of determinateness makes the proof21 surprisingly easy. Proposition 7 (Ockham Equivalence Theorem) Let the problem be determinate. Then the ecient solutions are exactly the Ockham solutions. The theorem loses its normative bite if the problem has no ecient solution, for ought implies can. At least each problem that involves counting or the freeing of parameters tied to veriable eects is eciently solvable, so the most important examples are covered. Some examples of a dierent character are examined in detail below. In spite
20 For the proof, suppose that a solution to a given, determinate problem is ecient. Then in each subproblem it is as good as an arbitrary solution. Let be an output sequence of the ecient method in a given subproblem. Collapse the redundancies and question marks out of to produce . No solution to the subproblem avoids producing an output sequence as bad as , so the scientist has no winning strategy in the -avoidance game. So by determinateness, the demon has a winning strategy, so is demonic. But maps into . So an ecient method is also as good as the demonic sequences in each subproblem. 21 Let a determinate problem be given. For the necessity argument, suppose that M violates Ockhams razor upon entering some subproblem by producing non-Ockham answer A. Let D be the set of demonic sequences for the subproblem. Since A is not Ockham, there exists (by proposition 1) a demonic sequence in the subproblem such that A does not map into any demonic sequence. Hence, M D. Suppose for reductio argument that M is ecient. Then (by proposition 6) M D. So M D. Contradiction. For suciency, it suces to argue that every nite sequence of Ockham answers encountered in subproblems successively reached as experience increases maps into some demonic sequence in the rst subproblem. For then an Ockham solution, which produces only sequences of Ockham answers interspersed with question marks, is no worse than the demonic sequences in each subproblem and, hence, is ecient. In the base case, each Ockham answer A in a subproblem is consistent with current experience, so the singleton sequence A can be forced by the demon in the subproblem and, hence, is demonic in the subproblem. Now consider a nite, nested sequence of subproblems P0 , . . . , Pn+1 with respective Ockham answers A0 , . . . , An+1 . By the induction hypothesis, (A1 , . . . , An+1 ) is demonic in P1 . Furthermore, since experience in P1 consistently extends that in P0 , whatever the demon can force in P1 he can force in P0 , so (A1 , . . . , An+1 ) is demonic in P0 . So since A0 is Ockham in P0 , (A0 , A1 , . . . , An+1 ) is demonic in P0 , which proves the lemma and the theorem.

29

of the topics obvious importance, I leave the general investigation of ecient solvability for another essay. More can be shown for the particle-counting problem and for others of its attractive kind. For such problems have the special feature that in each subproblem, if A is an Ockham violation upon entering the subproblem, then there exists an Ockham answer U upon entering the subproblem such that the binary sequence A U does not map into any demonic sequence for the subproblem. Say that such problems are stacked.22 Examples of non-stacked problems illustrate intuitive ideas about empirical symmetry and will be considered in the next section. The result is: Proposition 8 (Strong Ockham Necessity for Stacked Problems) In a stacked, determinate problem, each non-Ockham solution is strongly worse than each ecient solution.23 This ts closely with the spirit of the freeway example and with what is going on in particle counting and curve tting and other nested parameter problems. The property of being stacked can be viewed as the topological essence underlying the very strong Ockham intuitions attending such problems.

0.14

Testing as an Instance of Ockhams Razor

Suppose you want to decide whether some theory is true or not. That poses a binary question: the theory vs. the theorys denial. Typically, the theory is understood to be a closed subset of the underlying space of worlds, so its complement is veriable and, hence, it is refutable. Although Neyman and Pearson (1933) explicitly say that the choice of null hypothesis should be based on the practical consequences of error, everyone chooses the refutable (e.g., point) hypothesis as the null hypothesis and its denial as the alternative. On the proposed account of simplicity, this decision to accept the refutable hypothesis until it is refuted is an instance of Ockhams razor and is underwritten by the U-turn argument, so that the proposed theory of ecient theory choice subsumes this aspect of testing practice as a special case.
22 To see that the particle-counting problem is stacked, suppose that A is not Ockham upon seeing, say, four particles. Let U be the Ockham answer four. Then the binary sequence A U maps into no demonic sequence in the subproblem. For if A posits fewer than four particles, A maps into no demonic sequence since the demon cant force an arbitrary solution into a refuted answer. If A posits more particles, then (A, U ) maps into no demonic sequence since all such sequences are ascending. 23 For the proof, consider an ecient solution to a stacked, determinate problem and suppose that you solve a given subproblem but violate Ockhams razor upon entering it by producing A. Let U be the Ockham answer promised by the stacking property. Then since you already say A and U is compatible with the subproblem, the demon can force you into U . That is the initial U-turn resulting from the Ockham violation. Consider an output sequence of the optimal method. Then maps into some demonic (proposition 6) and hence into A U . Since you already say A and U is demonic (since is demonic and U is Ockham), A U maps into one of your output sequences in the subproblem. Of course, is worse than A U . Furthermore, A U maps into no demonic sequence so neither does A U . Since all the optimal methods output sequences map into demonic sequences, it follows that A U maps into none of the optimal methods output sequences. Hence, you are strongly worse than the optimal method.

30

First, observe that the demon can force you to conclude the refutable hypothesis H (by showing you a boundary point in the hypothesis, since closed sets contain all of their boundary points). Then he can show you data refuting the theory. So only (H, H) and its subsequences are demonic. Hence, only H is Ockham (proposition 1), so (by proposition 7), every ecient solution says H (or suspends) until H is refuted, which reects practice. Finally, that practice is ecient (since its output sequences are all demonic), so Ockhams razor bites and you should heed his advice. The trouble with standard conceptions of hypothesis testing is that they ignore the possibility of extra mind-changes. Yes, it is refutable to say that the bivariate mean of a normal distribution is precisely (0, 0), since {(0, 0)} is closed (and hence refutable) in the underlying parameter space. But what if you want to test the non-refutable and non-veriable hypothesis that exactly one component of the mean is zero? Solving this binary question requires multiple mind-changes, as in particle-counting and other model selection problems. For the demon can make it appear that both components are zero until you probably say no (as sample size increases) and then can reveal deviation of one component from zero until you say yes and then can reveal deviation of the other component from zero until you say no again, for a total of two mind-changes. For essentially, you are just counting deviations of mean components from zero as we were counting particles before. So the demonic sequences are all subsequences of yes, no, yes, so the obvious method of counting nonzero mean components is ecient and the unique Ockham hypothesis at each stage is the one that agrees with the current nonzero mean count. So Ockhams razor bites and you should heed his advice, (as you naturally would in this example). Since testing theory usually isnt applied until all the parameters in a model are xed by point estimates, it appears as though a testing theory for refutable (closed) hypotheses is good enough. The result is that the essential involvement of Ockhams razor in testing theory is missed and so the strong analogy between model selection and testing with multiple mind-changes is missed as well. The proposed account of Ockhams razor therefore suggests a new, more unied foundation for classical statistics, whose development lies beyond the scope of this explorative essay.

0.15

Ockham and Respect for Symmetry

When there are two equally simple answers compatible with the data, Ockham cant help you decide among them and the strong intuition is to wait for nature to break the symmetry prior to choosing. For example, modify the particle-counting problem so that particles come in two colors, green and blue and you have to specify the total number of each that will ever be emitted. Assume also that you can hear particles rattle down the faucet before they emerge from the machine (gure 38). Having seen no particles, you hear the rst one coming. What do you conclude? It would seem, nothing, for on what basis could you decide whether it will be green or blue? This is not mere skepticism, since after the symmetry is broken you will eventually have to leap to the bold Ockham hypothesis that no more particles are coming. Instead, it

31

Bonk! Bonk!

Figure 38: Breaking Symmetry. is respect for symmetry, one of the strongest intuitions in science since Greek times. That leads to an intriguing idea. Perhaps the U-turn argument also explains our strong hesitance to break symmetries in experience. Then respect for symmetry would simply be Ockhams razor conforming itself to the structure of symmetrical problems. That is correct. Consider how Ockhams razor applies to the case at hand. When you hear the rattling that announces the rst particle, you have entered a new subproblem. There are intuitively two equally simple answers at that point, one green, zero blue or symmetrically zero green, one blue. But neither of these answers is Ockham. For each answer constitutes a unit demonic sequence but neither binary sequence consisting of the two symmetrical competitors is demonic in the subproblem, since the demon cant take back the rst particle after its color is observed. So Ockham demands silence, and we already know from proposition 7 that every ecient solution to the problem must heed this advice. Is there an ecient solution? Indeed, just heed the advice by counting the total number of particles whose colors are seen and by suspending judgment when the next rattle is heard. Hence, respect for symmetry has normative bite in this example. This symmetrical problem is also a nice example of a non-stacked problem. For consider the answer zero green, one blue. There is no Ockham answer one can concatenate to this answer in the subproblem entered with the rst rattle because there is no Ockham answer at all. And the violator is not strongly worse than the Ockham method just described in that subproblem because the demon can force even an optimal method to say the same answer the violator chose in advance and the violator produces no output sequence worse than that. Still, the optimal method is not just better but also strongly better over just the worlds in which the violators forbidden guess is false, which is not strongly better, but is better than just better. The same argument works even after a run of a thousand exclusively green particles, in which case it might be objected that past experience does break symmetry between blue and green. But the subproblem is unaltered if one exchanges the colors of the particles occurring in each world. Since the unaltered structure of the subpproblem preserved through this reection is all that eciency in the subproblem depends on, no non-circular, performance-based account of Ockhams razor could possibly explain why it is better in the subproblem to say green rather than blue on the basis of a higher frequency of green particles over blue particles prior to entering the subproblem. Again,

32

this is not mere skepticism, because innite projections beyond the data are allowed as soon as the symmetry in the subproblem is truly broken. In the preceding problem, the counting question slices the problem into topologically invariant simplicity degrees corresponding to particle counts in spite of occasional symmetries (e.g., when the particle rattles and has not yet been seen). In other problems, symmetry is so pervasive that Ockhams razor cant slice them into objective simplicity degrees (gure 39). For example, suppose you have to report not only how

Tonk!

Figure 39: Overly Symmetrical Problems. many particles will appear but when each one will appear (forgetting about color). It might seem, at rst, that the simplest answer is to say that you have seen all the particles already and that they appear exactly when they were observed to, since if you were asked only how many particles there are, you would only be permitted to say the number seen so far. That is so, if you choose to conceive of the sequence identication problem as a renement of the particle counting problem. The trouble is that the sequence identication problem also renes a wide variety of alternative problems that would lead to dierent answers. For example, twoticles are non-particles up to stage two and particles thereafter. There are nitely many particles if and only if there are nitely many twoticles, so the underlying space is unaltered by the translation. But the answers to the two counting problems are dierent and the U-turn argument leads to correspondingly dierent recommendations (i.e., to count particles or to count twoticles, respectively). Since the sequence identication problem renes the problems of counting particles, oneticles, twoticles, threeticles, etc., it cant consistently favor one kind of counting over another without making a global, symmetry-breaking choice in favor of one of its possible coarsenings. The only sensible resolution of this Babel of alternative coarsenings is for Ockham to steer clear of it altogether. And thats just what the proposed theory says. First of all, no answer is Ockham in this problem, since every demonic sequence is of unit length. For consider a single answer. The answer is true in just one world, which the demon can present until you take the bait. So each unit sequence of answers can be forced. For any alternative answer (satised by an alternative world) there is a least stage by which the two cease agreeing and diverge. But some solution refuses to be convinced of the rst answer (on pain of converging to the wrong answer) until the divergence point is already passed (gure 40). So the demon can force no binary sequence of answers from an arbitrary solution. Hence (proposition 6), there can be no ecient solution, since no solution to this problem succeeds without mind-changes. So there are lots of solutions to this problem, but no ecient ones. Hence, even if there were an Ockham answer, there 33

already ruled out Hasta la vista, baby!

Figure 40: The Trouble With Singleton Answers. would be no ecient method to put normative force in the U-turn argument! Ockham is both mute and toothless (perhaps mute because toothless) in this problem. Again, that is the correct answer. The sequence-identication problem is completely symmetrical in the sense that any homeomorphism of the space into itself results in the very same problem (since each permuted world still ends up in a singleton answer over the same topological space). So there is no objective, structural sense in which one answer is simpler than another any more than there is any objective physical sense about where zero degrees longitude is. Coordinate systems are not real because they arent preserved under physical or geometrical symmetries; philosophical notions of simplicity (e.g., brevity, sequential uniformity, entrenchment) are not real because they arent preserved under problem symmetries. To seek real, truth-nding eciency in distinctions that really arent in the problem is like trying to extract energy by formally transforming coordinates that arent really in the world. The situation is quite dierent in the particle-counting problem. There exist homeomorphisms of the underlying topological space that materially alter the original problem (e.g., the unique Ockham hypothesis no particles would become no twoticles, which means two particles right away). It is precisely this lack of symmetry in the particle-counting problem that allows Ockham to slice it into objective simplicity degrees. The usual attempts to use coding, entrenchment, or prior probability to force a foliation of the sequence identication problem into simplicity degrees must reect the imposition of extraneous considerations lacking in the problems intrinsic structure as presented. Therefore, such considerations couldnt possibly have anything objective to do with solving the problem (as stated) eciently. So the theory yields precisely the right judgment when the true nature of the case is properly understood. One can also arrive at overly-symmetical problems by coarsening the particlecounting problem. For example, consider the question whether there is an even or an odd number of particles. Since this coarsens the particle-counting problem, one again expects even to be the Ockham answer when an even number of particles have been observed and odd to be the right answer otherwise (gure 41). But the proposed theory of Ockhams razor doesnt agree. Ockham is once again silenced, but this time the diculty is exactly reversed: every solution is ecient and every answer is 34

Odd

1 2

Ockham in every subproblem so every method satises Ockhams razor and the U-turn argument cant even begin (gure 42).

No guidance, all answers Ockham. Strong guidance unique Ockham answer. No guidance, no Ockham answers.

The theory is right. Yes, if one thinks of the problem as a coarsening of particle counting, even must come rst. But one could also think of it as a coarsening of counting oneicles instead of particles, where a oneicle is a particle at every stage except for stage one (stages start at zero), at which time it is the absence of a particle. Then the zero oneicle world is the unique world in which a single particle appears in position one and then no more particles occur. This is an odd world. The one oneicle worlds include the zero particle world as well as all the two particle worlds in which the rst appears right away. These are all even particle worlds. Continuing in this way one obtains a oneicle-counting foliation (gure 43) in which the obvious rst conjecture is odd. But the oneicle translation is a homeomorphism of the space that preserves answers as well, so the particle counting problem isnt really in 35

...
2 2

Figure 41: Even/Odd as Particle Counting

Figure 42: Ockham Under Renement

...
2

particle counting

...
2

...
2

Even

even/odd

oneicle counting

singleton partition

Even particles

1 2

1 2

...
2 question Deduction

...

...
2

...
oneicle 2 2 0

Odd particles

non-oneicle

Figure 43: Even/Odd as Oneicle Counting the even/odd problem after all, even though even and odd refer to particles rather than to oneicles! Therefore, a prior preference for even couldnt have anything to do with the objective eciency of solutions to the even/odd problem as stated. There is, nonetheless, something unnerving about these examples, for one tends to think of Ockhams razor as a (very) handy inference rule that can be used in running arguments, interleaved with other valid inference rules. But such use tacitly suggests that Ockhams razor commutes with deduction (gure 44) in the sense that applying

Ockham

answer Deduction

Ockham coarse question coarse answer


Figure 44: Ockham Does Not Commute With Deduction Ockhams razor and then deducing a consequence from the resulting theory would yield the same answer as deductively coarsening the possible answers to the original question and then applying Ockhams razor in the coarsened question. But that idea is refuted by both of the preceding examples. Anything goes in the singleton problem, but coarsening to the counting problem results in the unique Ockham answer which may contradict the arbitrary answer to the singleton problem. Or starting in the counting problem, one is stuck with the Ockham answer (which entails just one of even or 36

odd) but coarsening to even/odd rst allows you to choose an answer conicting with the Ockham answer to the counting problem. Be that as it may, the theorys approach is less paradoxical than the alternative. If a problem renes or is rened by dierent counting problems, insistence on agreement with just one favorite counting problem would render the possible contradictions just mentioned necessary in the unfavored problems. More importantly, the situation is as it must be if Ockhams razor is to be connected objectively to truth-nding performance. For if symmetries allow one to carve dierent rened problems out of a given coarse problem there can be no objective sense in which truth-nding eciency of the coarse problem is furthered by merely thinking of the coarse problem one way as opposed to another. To insist upon one way of thinking of a coarse problem as opposed to another is a symptom of our uncritical appetite to over-extend Ockhams razor to symmetrical problems in which it couldnt help us nd the true answer. This point has the avor of a Kantian critique: a good a priori principle over-extended is a good a priori principle gone bad and antinomies and foundational confusion are the sorry but inevitable consequence (gure 45).

Hey, Ock baby. I got what ya need. Now pick up dat razor! smuggled structure
Figure 45: Theft Over Honest Toil

0.16

Conclusion: Ockhams Family Secret Revealed!

Ockhams razor is beloved as an inexhaustible source of free information that somehow parlays the scientists limited viewpoint into sweeping generalizations about unseen realities (gure 46). The trouble is to explain how such a principle could help us at all. And the irony is that clinging to the myth precludes any such explanation. For Ockham does help us nd the truth, but in a spooky and unexpected way. He doesnt provide any guarantee that the theory he selects is true or probably true. He doesnt point at the truth. He cant even bound the number of future surprises or U-turns you will have to make in the future on your way to the truth. All he does is save you the trouble of needless surprises beyond those arbitrarily many surprises the demon is objectively in a position to exact from you. But in that respect, his advice is still uniquely the best. What is spooky is that the explanation ts our practice so well but falls so far 37

He had me cornered! Urk!

Figure 46: Ockhams Day Job short of our craving for certainty and guarantees against future surprises. A skeptical denial that Ockhams razor does anything for us at all is readily rejected, for simplicity is, after all, the nontrivial core of scientic method. The principled argument that Ockham provides the best possible help in a far weaker sense is far more subversive it uniquely justies the practice better than alternative stories while acknowledging frankly that simplicity couldnt possibly be doing what we want it to. Our personal feelings of conviction in the theories we choose dont help: beliefs xed by a method that merely minimizes reversals of opinion would still be beliefs and would still be written in textbooks and spoken in hushed reverential terms and brow-beaten into our students. None of that theatrical pathos implies that Ockham does the impossible. To further the irony, the topological asymmetries in a question that secure Ockhams truth-nding ecacy amount to iterated problems of induction, which correspond to demons stacked on demons stacked on demons. Without this structure, there would be no asymmetry in the problem addressed and no objective simplicity ranking among theories that could possibly be related to truth-nding eciency. But the greatest irony of all is that Ockhams razor turns out to be dened in terms of just these nested demonic structures lurking in the problem, so that Ockham is actually made of demon-stu. Not only is Ockham the demons ospring; they work together as an inseparable, coordinated team, since Ockham changes his recommendations each time the demon uses up one of his opportunities to fool the scientist (gure 47). To see Ockham as he is, rather than as we would like him to be requires

Daddy! We sure had em going today, son! Did you see that guys face?

Figure 47: But by Night. . . abandonment of the naive desire to get something for nothing. One must put aside all thought of defeating the problem of induction and, instead, start to respect and study the underlying, objective complexity in problems that gives rise to it, as is routine in 38

the mathematical theories of computability, computational complexity, and descriptive set theory. In these established scientic subjects, nobody would dream of victory over the mathematical complexity analysis of a problem. It is late in the day for the philosophy of science and induction to be dreaming still.

0.17

Acknowledgements

I would like to thank Seth Casana, John Taylor, Joseph Ramsey, Richard Scheines, Oliver Schulte, Pieter Adriaans and Balazs Gyenis for discussions and useful comments. Special thanks are due to the organizers of the Fifth International Conference on Foundations of the Formal Sciences for such an interesting interdisciplinary conference devoted to innite game theory. The centrality of determinacy to the main results of this paper reects the inuence of some of the excellent presentations by other authors in this volume.

0.18

Bibliography

Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory. pp. 267281. Carnap, R. (1950) Logical Foundations of Probability, Chicago: University of Chicago Press. Forster, M. R. and Sober, E. (1994): How to Tell When Simpler, More Unied, or Less Ad Hoc Theories will Provide More Accurate Predictions. The British Journal for the Philosophy of Science 45: 1-35. Friedman, M. (1983) Foundations of Space-Time Theories, Princeton: Princeton University Press. Glymour, C. (1980) Theory and Evidence, Princetion: Princeton University Press. Goodman, N. (1983) Fact, Fiction, and Forecast, fourth edition, Cambridge: Harvard University Press. Jain, S., Osherson, D., Royer, J. and Sharma, A (1999) Systems That Learn: An Introduction to Learning Theory. Camabridga: M.I.T. Press. Kant, I. (1988) Kant Selections, L. White Beck, ed., New York: MacMillan. Kechris, A. (1991) Classical Descriptive Set Theory, New York: Springer. Kelly, K. (1996) The Logic of Reliable Inquiry. New York: Oxford.

39

Kelly, K. (2002)Ecient Convergence Implies Ockhams Razor, Proceedings of the 2002 International Workshop on Computational Models of Scientic Reasoning and Applications, Las Vegas, USA, June 24-27. Kelly, K. (2004) Justication as Truth-nding Eciency: How Ockhams Razor Works, Minds and Machines 14: 485-505. Kelly, K. and Glymour, C. (2004) Why Probability Does Not Capture the Logic of Scientic Justication, forthcoming, C. Hitchcock, ed., Contemporary Debates in the Philosophy of Science, Oxford: Blackwell, 2004 pp. 94-114. Kuhn, T. (1962) The Structure of Scientic Revolutions, Chicago: University of Chicago Press. Laudan, L. (1981) A Confutation of Convergent Realism, Philosophy of Science 48, pp. 19-48. Li, M. and Vitanyi, P. (1997) An Introduction to Kolmogorov Complexity and Its Applications, New York: Springer. Martin, D. (1985) A Purely Inductive Proof of Borel Determinacy, Recursion Theory, Proceedings of the Symposium of Pure Mathematics, 42: 303-308. Mitchell, T. (1997) Machine Learning. New York: McGraw-Hill. Neyman, J. and E. Pearson (1933) On the Problem of the Most Ecient Tests of Statistical Hypotheses, Philosophical Transactions of the Royal Society, 231A: 289-337. Popper, K. (1968), The Logic of Scientic Discovery, New York: Harper. Putnam, H. (1965) Trial and Error Predicates and a Solution to a Problem of Mostowski, Journal of Symbolic Logic 30: 49-57. Quine, W. (1969), Natural Kinds, in Essays in Honor of Karl Hempel, N. Rescher, ed., Dordrecht: Reidel. Rosenkrantz, R. (1983) Why Glymour is a Bayesian, in Testing Scientic Theories, J. Earman ed., Minneapolis: Universitiy of Minnesota Press. Schulte, O. (1999a) The Logic of Reliable and Ecient Inquiry, The Journal of Philosophical Logic, 28:399-438. Schulte, O. (1999b), Means-Ends Epistemology, The British Journal for the Philosophy of Science, 50: 1-31. Schulte, O. (2001) Inferring Conservation Laws in Particle Physics: A Case Study in the Problem of Induction, The British Journal for the Philosophy of Science, 51: 771-806. 40

Scott, D. S. (1982) Domains for Denotational Semantics Automata, Languages and Programming: Ninth Colloquium: Ninth Colloquium Nielsen, M. and Schmidt, E., eds., Lecture Notes in Computer Science 140, Berlin: Springer, pp. 577-613. Spirtes, P., Glymour, C.N., and R. Scheines (2000). Causation, Prediction, and Search. Cambridge: M.I.T. Press. Vitanyi, P. and Li, M. (2000) Minimum Description Length Induction, Bayeisanism, and Kolmogorov Complexity. IEEE Transactions on Information Theory 46: 446-464. Wasserman, L. (2000) Bayesian Model Selection and Model Averaging, Journal of Mathematical Psychology, 44: 92-107. Wasserman, L. (2003)All of Statistics: A Concise Course in Statistical Inference. New York: Springer. von Leibniz, G. (1951) Leibniz Selections, P. Wiener, ed., New York: Scribner.

41

You might also like