36 Comments
User's avatar
Daniel Greco's avatar

For both Chris Samp and Some Lawyer, yeah a natural way to avoid the tension is to constrain fallibilism. It's not that you can't be 100% certain about *anything*. Rather, it's that your 100% certainties should be restricted to some limited class of claims. Some Lawyer is suggesting they should just be claims about your experiences. Chris Samp seems happy for certainties to include claims about poll results. But maybe both will agree no certainty in abstract scientific theories.

I've got my reasons for not loving this style of answer, but I don't think they're knock down. For claims about experiences, I tend to think that for facts about what you've experienced to be the sorts of facts that you're comfortable treating as absolutely certain, indubitable, etc., you have to drain them of quite a lot of content. E.g., it can't be something like "I'm hungry", because you might be mistaking hunger for some other kind of stomach pain. And I'm not sure that experiences drained of that much content can play the roles they'd need to as evidence. Certainly it's very hard to imagine actually working with runnable models that treat as evidence stuff about what experiences I have, rather than just the poll results themselves. The state space you'd need for such models would be intractably large and unstructured. So once you make this move, you're necessarily, I think, moving to a version of Bayesianism that's got to be pretty detached from either models we explicitly build for forecasting, or even, I think, the ones we implicitly reason with in everyday life; I tend to think actual reasoning involves taking a lot more for granted--thinking about "small worlds", as Savage said--than this kind of picture suggests.

But my preferred approach leads to puzzles too, because you need non-Bayesian ways of ceasing to treat as certain evidence stuff that you previously were treating as certain evidence. E.g., if you treat as evidence that some poll obtained a certain result, and then later learn that there was a typo in the report, the process by which you "unlearn" the evidence about the poll result can't just be a straightforward Bayesian update in the model you started with.

Expand full comment
Godshatter's avatar

This is a nice point but doesn’t seem too worrying for the Bayesian fallibilist. Because it seems like being certain in one’s evidence is just an artifact of the idealization required to cast learning episodes into this mathematical framework. In order to model these updates, you need certain elements that you treat as fixed for the sake of the model, and then update from there. But you can always go back to revise those elements… you don’t take those to be fixed forever. (Neurath’s boat and all that.) Like you can always treat the evidence as a further hypothesis and then ask what the evidence for that is; eg, is the reporting agency of the polls is usually reliable etc.

Expand full comment
Chris Schuck's avatar

I have no background in this area, but your post jibes with misgivings I've had about the universal applicability of Bayes-style updating given the relative nature of "evidence." My sense is that evidence often gets too much of a free pass in these kind of discussions, as if what counts as evidence (and how much it should count) is itself automatically self-evident. For example: how would we operationalize my statement above, that your post subjectively confirms ("jibes with") my subjective misgivings relating to a partly theoretical concern, in terms of quantifiable evidence?

Forgive me if this is naive to ask - but doesn't the standard Popperian view of scientific knowledge presuppose that disconfirming evidence gets much closer to 1 (certain falsity) than any confirming evidence can get to 1 (certain truth), by virtue of inherent asymmetry between these two? Or does that only have to do with the larger context of how the evidence connects with a particular sets of claims, at the level of theory? In any case, this raises a question of whether the context for how evidence is functioning epistemically might be inseparable from any calculation of its probabilistic value.

My one general thought from your piece is that it underscores the value of scrutinizing any qualitative differences (including metaphysical differences) that may be relevant between the nature of the stuff forming the basis for evidence, and the nature of the stuff forming the basis for your hypothesis. In the Nate Silver polling example, if "evidence" = polling numbers which is based on people's reported attitudes in questionnaires or over the phone, and the hypothesis has to do with an election outcome based on what people punched into a voting machine, which is then processed through the Electoral College - these are somewhat different logical types which may not translate into numbers in quite the same way (even questionnaires vs. over the phone adds an extra wrinkle) The underlying structure of the hypothesis may not track well with the underlying structure of the evidence. In a way, this recalls the claims by some theoretically-minded psychologists (most notably Joel Michell) that human psychological phenomena doesn't have the appropriate metaphysical structure to be coherently translated into quantitative data.

Expand full comment
Daniel Greco's avatar

I do intend to talk about Popper in a future post. His fallibilism, as I understand it, is the kind of "restricted" fallibilism I mention in a comment above. He thinks you never get to be certain (or even confident) in the truth of abstract, general scientific theories, but he doesn't seem nearly so worried about getting to learn observational claims for certain. The idea that you can falsify a theory by making an observation that's inconsistent with the theory seems to take for granted that you actually get to know for certain what you're observing.

Expand full comment
Kenny Easwaran's avatar

I’ve lately been thinking about cases like Williamson’s unmarked clock, where our “evidence” doesn’t come from a partition. The best treatment I have of this says that the “evidence” isn’t actually even a proposition in these cases, but is treated as something outside the algebra to which probabilities apply. It turns out the mathematically optimal rule for updating in these cases is equivalent to extending the algebra, doing Bayesian conditioning as if this had been a proposition you had credences in, and pulling back to the original algebra, and ends up looking like a Jeffrey update.

Expand full comment
Daniel Greco's avatar

This sounds a bit like the thought that in Jeffrey's original case, you can think of it as updating on a regular old proposition about the cloth having some particular indeterminate look that it's more likely (but not certain) to have if it's green.

Expand full comment
Kenny Easwaran's avatar

Yeah I think it’s basically that same idea, but with a bit less metaphysical commitment to there being propositions or evidence taking the form of a proposition.

Expand full comment
Daniel Gallagher's avatar

Incidentally, non-partitional evidence also occurs in Sleeping Beauty cases, and I think is the main reason why it is possible to construct Dutch books against basically every decision theory using them.

Expand full comment
Daniel Gallagher's avatar

I think this is interesting because in the unmarked clock case, you can argue that the agent is not ideally rational since they are not reflecting on their own credences. But in Sleeping Beauty it can occur even for agents who can do this, because of the memory erasing.

Expand full comment
Henri B's avatar

Isn't there a tension between "I'm talking about Bayesian epistemology/philosophy of science, not just Bayesian methods in applied statistics" and your point that Jeffrey's rule is (almost) never used in Bayesian statistics? When Bay Area rationalists assign arbitrary-seeming numbers to the possibility of AI apocalypse or whatever, these are of course never the result of explicit Bayesian modeling. But this needn't be a problem, if updating by Jeffrey's rule is something like an ideal rather than a methodology.

Expand full comment
Daniel Greco's avatar

Here's what I think I want I say. I see the Bay Area rationalists as seeing Bayesian methods in applied statistics and thinking: "that is a good model for all of rational cognition"--even while the applied statisticians themselves may be much more modest about the proper scope of their methods. So if applied statisticians can't get by without a bunch of certainties, that's awkward for a view on which avoiding any certainties is the ideal.

At the very least, I think reflecting on this stuff makes clear how much distance there is between anything we could actually do, implicitly or explicitly, and the rationalist ideal of updating a giant probability function by Jeffrey conditionalization.

Expand full comment
Henri B's avatar

Ok, that makes sense, but I'm not sure it's true as a claim about Bay Area rationalists specifically. My sense from hanging out on lesswrong is that they're similar to philosophers in being much more impressed by simple and elegant mathematical models than by messy empirical work.

About the intractability of Jeffrey's rule: sure, Bayesian updating is less demanding than Jeffrey's rule, which requires you to repeat Bayesian updating once for each cell in the partition. But it's also computationally intractable. If we really want a tractable rule, both Jeffrey and Bayes are out the window. And it's not hard to imagine an argument that the best tractable rules are "fallibilist": it's a waste of cognitive resources to have beliefs about neurons firing, so the best tractable rules will just shift around your beliefs about everyday matters in response to neurons firing rather than update directly on the fact that they fired. I'd bet that the optimal way to shift such beliefs around won't make one certain of anything.

Expand full comment
Daniel Greco's avatar

So first, yeah even strict Bayesian updating is famously np-hard, so if you think the brain is somehow Bayesian, you've got to think it's approximating Bayesian updates, rather than hitting them exactly. But there's a rich tradition of approximate Bayesian updating, so I think that's an attractive picture.

But second and even more handwavy, the picture I like is a kind of two stage one where decision and inference involves first somehow identifying small-world models to solve, but then solving them in broadly familiar, bayesian/decision theoretic ways. That's an alternative to a picture where we're always working with "grand world" problems.

With this picture, you can have plenty of certainties in your small world models, because you're not always using the same small world models. Eg, maybe you start out using a small world model where soneone isbtossing a fair coin. But if you're then asked to bet on whether it's double headed, you switch (how? Good question! Can't just be a Bayesian update) to a different small world model that makes room for that possibility.

Expand full comment
Kolmogorov's Ghost's avatar

This seems possible to accommodate in normal Bayesian frameworks by simply being very careful about the thing you're 100% certain of. In the election forecasting example, you're 100% certain that your database has a row that says "Pollster name=Some pollster, date = some date, result = some result". Of course, this doesn't mean you're 100% certain the pollster actually published this result on that date (maybe your scraping is buggy), and less certain than that that the pollster actually polled that many people and they answered as indicated in your result. It is unclear to me that this is not fully addressed by modeling the pollster as producing a number that is correlated but not exactly equal to the voting intentions of eligible voters.

Expand full comment
Pete Griffiths's avatar

Excellent explanation

Expand full comment
Bob Jacobs's avatar

I’m no longer a Bayesian, but there are ways to tackle the problem laid out in this post. For example, in my post “Solutions to problems with Bayesianism” ( https://bobjacobs.substack.com/p/solutions-to-problems-with-bayesianism ) I propose that we don’t use single numbers but use an interval instead (well, I actually propose something closer to a convex hull, but I’ll stick to interval in this explanation for the sake of simplicity).

So instead of P, we have a set K. Your prior is K = {P⁽ᶦ⁾} (apologies for the atrocious formatting, substack doesn’t work well with mathematical characters. Between the brackets is an i )

When you observe E, condition every member:

K_E = { P⁽ᶦ⁾(· | E) }.

Within each P⁽ᶦ⁾ we still have P⁽ᶦ⁾(E)=1, so Bayes’s formula stays intact, but with P⁽ᶦ⁾ instead: P⁽ᶦ⁾(H | E)=P⁽ᶦ⁾(H)·P⁽ᶦ⁾(E | H)/P⁽ᶦ⁾(E).

Your overall confidence in E is now an interval:

P̲(E)=inf_{P∈K_E} P(E)  and  P̄(E)=1.

Because P̲(E) can stay below 1, you are not forced into certainty, even though each precise model is.

Now, a drawback of this model is that it’s more computationally expensive, but since we’re doing epistemology and not software design, I don’t think this is much of an objection. (Let me know if you have trouble with the atrocious formatting)

Expand full comment
Jake Zuehl's avatar

Thanks for this public service!

Expand full comment
John Encaustum's avatar

Nice writeup! Personally, my favorite treatment of these issues I've seen so far is Wolfgang Spohn's in The Laws of Belief. I'm still digesting it, and I'm not convinced that the ranking functions are a proper fix for the issues I care about most, but it's been extremely thought provoking for me and has inspired significant further reading into tropical arithmetic and some topos theory.

Expand full comment
Some Lawyer's avatar

Don’t we only have to only be 100% sure that a certain qualia was experienced to rely on Bayesianism? And is that such a terrible bullet to have to bite for fallibilism?

For example, let’s say I would like to apply my likelihood P(E|H), but I acknowledge I don’t have direct access to E (which is some evidence/fact that is unintermediated by polls, mental representations, etc.). Instead I have E’, which is a qualia I would have if I thought I *knew* I experienced E (but possibly also resulting if certain things happened that aren’t E). I also have the similar H’.

So I use P(E’|H’) and P(E’) to update my P(H’). My P(E) also updates (based on the likelihood that E’ occurred given E) as does my P(H) (on the same reasoning but for H’ and H).

With this method we can simply deny, e.g., we have access to P(E|E), and we instead only can use P(E’|E) to update E (which is very slightly less than 1, reflecting risks risks that E’ does not map to reality because of cognitive limitations, bad data, memory corruption, etc.).

We still have P(E’|E’)=1, but E’ is the sort of phenomenological experience to which we can attach total certainty while maintaining the essential components of fallibilism. After all, P(E’|E’) merely says we can be sure our thoughts and experiences exist, yet we remain fallible as to how they map onto the world (so P(E) can be below 1 even when P(E’) is at 1).

No one probably does this “mental states model” approach to applying Bayes, and yet they get good results. But I can simply acknowledge that in applying Bayes we can generally act as if we’re directly working with E and H instead of E’ and H’ as a simplifying convention. So this simplified approach does have P(E|E)=1, but it’s not what I really believe as a fallibilist. Therefore, my belief in fallibilism is consistent with my Bayesianism.

Expand full comment
David Riceman's avatar

Hartigan in the seventies did some work on having a set of priors. You might be able to get a similar effect by comparing a loss function over a deterministic guess and over a distribution. I'd guess you can find contexts in which there are distributions which are uniformly better. Then find a way to improve that, take a limit, and voila.

Expand full comment
Ali Afroz's avatar

I think the actual solution to your problem is that probability theory is an approximation we use because while we don’t have hundred percent certainty, we are certain enough to treat uncertainty as arounding error. For example, in election forecasting you might not be certain that you remember the results of a poll correctly if you just check it a second ago, but you’re certain enough to act as if it’s 100% certain for practical purposes without causing large enough errors to be a problem. After all, to be very honest, probability theory relies on assuming a bunch of statements with 100% certainty, like that all probabilities can’t add up to more than one. Now logical uncertainty even about basic and seemingly obvious mathematical statements is in fact, pretty plausible, especially since we have been wrong about what mathematical statements are definitely true in the past, just look at Euclidean geometry, but we probably can have enough confidence in basic axioms of probability theory to use 100% certainty as a rounding error. So even though we don’t actually have 100% certainty, we can use probability theory for practical updating in response to new information in real life.

Expand full comment
Phoenix's avatar

Your reason why Bayes formula necessitates infallibalism seems fishy. You write the formula as "Pnew(H)=P(H)*P(E|H)/P(E)." That then allows you to replace H with E, and obtain the resulting equation "Pnew(E)=P(E|E)" which you use to show that Pnew(E) is equal to 1.

However, I don't think you got Bayes formula correct. There's no such thing as "Pnew". The way I've always seen Bayes formula, and I double-checked on Wikipedia, is "P(H|E)=P(H)*P(E|H)/P(E)." The "new" probability of H just means the probability of H given E. With that established, if you replace H with E, you just get the identity "P(E|E)=P(E|E)." And that obviously proves nothing.

Expand full comment
Daniel Greco's avatar

The reason that sometimes you'll see the term on the left as P(H|E) and sometimes as P(H)new is that another way of expressing the whole idea is that when you learn E, your new probabilities for any hypothesis H should be your old probabilities, conditional on E. That is:

Pnew(H) = Pold(H|E)

Together with the definition of conditional probability, that gives you the rule as I expressed it. Another way of putting it is the Wikipedia definition is just relating the conditional probability P(H|E) to some other stuff. It's a further claim that your new probability once you learn E should be your old conditional probability P(H|E), but given that further claim, there's no difference between how I put things and how you saw them on wikipedia.

Expand full comment
Swen Werner's avatar

Greco’s post claims to expose a “dirty secret” in Bayesian reasoning, but ironically, it commits a fundamental misuse of Bayes’s rule. His Caribbean island example illegitimately invokes Bayesian updating in a context where no prior causal or epistemic connection exists between the predictor (the man in Times Square) and the outcome (the inheritance). Bayes’s rule only works meaningfully within a structured probabilistic model—his example is a textbook case of post hoc reasoning dressed up as formal inference. Far from revealing a flaw in Bayesianism, the post illustrates exactly what happens when you apply Bayes’s rule without understanding its assumptions. In this case, the article evidences a concerning lack of understanding of the very fundamentals of the subject it sets out to critique.

Expand full comment
Daniel Greco's avatar

Your comment and some others here are making me realize I should've more explicitly addressed what I now see is a common misunderstanding. The idea that "Bayes's rule only works meaningfully within a structured probabilistic model", while defensible, is far from uncontroversial. Dissent from that idea is not "misunderstanding", but substantive disagreement. The people I'm calling Bayesians have a much more capacious view of the proper scope of Bayesian reasoning; they think that the sort of stuff Bayesian statisticians do when analyzing data is just a more explicit version of good informal empirical reasoning, as well as the kind of reasoning scientists do when they decide whether to accept a novel theory (where you'll never have the kind of background frequency information you'd really want if you were going to build a convincing statistical model). And they think that we can distinguish better and worse informal empirical reasoning by thinking of it through the lens of Bayes's rule. I linked to Sean Carroll to make clear that the view isn't a straw man, but I could've easily found quotes from Bay Area rationalists, or to take an example less familiar to substackers, but influential in the philosophy of science, Howson and Urbach's classic tome "Scientific Reasoning: The Bayesian Approach."

Expand full comment
Swen Werner's avatar
User was temporarily suspended for this comment. Show
Expand full comment
Ebenezer's avatar

Bayes' Rule is basically fiction anyways. E and H are mere abstractions which are supposed to correspond to reality. The simplest reconciliation is something like: "In a simplified world where we have random variables E and H which are supposed to correspond to X and Y aspects of the world, we can update our probabilities as follows: [...]"

This is something Richard McElreath talks about in his book/lectures I believe. The difference between the "small world" of the model and the "big world" in which we actually reside.

Expand full comment