Greco &amp; Wansley

May 10

I do intend to talk about Popper in a future post. His fallibilism, as I understand it, is the kind of "restricted" fallibilism I mention in a comment above. He thinks you never get to be certain (or even confident) in the truth of abstract, general scientific theories, but he doesn't seem nearly so worried about getting to learn observational claims for certain. The idea that you can falsify a theory by making an observation that's inconsistent with the theory seems to take for granted that you actually get to know for certain what you're observing.

Expand full comment

Kenny Easwaran

I’ve lately been thinking about cases like Williamson’s unmarked clock, where our “evidence” doesn’t come from a partition. The best treatment I have of this says that the “evidence” isn’t actually even a proposition in these cases, but is treated as something outside the algebra to which probabilities apply. It turns out the mathematically optimal rule for updating in these cases is equivalent to extending the algebra, doing Bayesian conditioning as if this had been a proposition you had credences in, and pulling back to the original algebra, and ends up looking like a Jeffrey update.

Expand full comment

This sounds a bit like the thought that in Jeffrey's original case, you can think of it as updating on a regular old proposition about the cloth having some particular indeterminate look that it's more likely (but not certain) to have if it's green.

Expand full comment

Kenny Easwaran

Yeah I think it’s basically that same idea, but with a bit less metaphysical commitment to there being propositions or evidence taking the form of a proposition.

Expand full comment

Daniel Gallagher

Incidentally, non-partitional evidence also occurs in Sleeping Beauty cases, and I think is the main reason why it is possible to construct Dutch books against basically every decision theory using them.

Expand full comment

Daniel Gallagher

I think this is interesting because in the unmarked clock case, you can argue that the agent is not ideally rational since they are not reflecting on their own credences. But in Sleeping Beauty it can occur even for agents who can do this, because of the memory erasing.

Expand full comment

Henri B

May 28

Isn't there a tension between "I'm talking about Bayesian epistemology/philosophy of science, not just Bayesian methods in applied statistics" and your point that Jeffrey's rule is (almost) never used in Bayesian statistics? When Bay Area rationalists assign arbitrary-seeming numbers to the possibility of AI apocalypse or whatever, these are of course never the result of explicit Bayesian modeling. But this needn't be a problem, if updating by Jeffrey's rule is something like an ideal rather than a methodology.

Expand full comment

May 28

Here's what I think I want I say. I see the Bay Area rationalists as seeing Bayesian methods in applied statistics and thinking: "that is a good model for all of rational cognition"--even while the applied statisticians themselves may be much more modest about the proper scope of their methods. So if applied statisticians can't get by without a bunch of certainties, that's awkward for a view on which avoiding any certainties is the ideal.

At the very least, I think reflecting on this stuff makes clear how much distance there is between anything we could actually do, implicitly or explicitly, and the rationalist ideal of updating a giant probability function by Jeffrey conditionalization.

Expand full comment

Henri B

May 29

Ok, that makes sense, but I'm not sure it's true as a claim about Bay Area rationalists specifically. My sense from hanging out on lesswrong is that they're similar to philosophers in being much more impressed by simple and elegant mathematical models than by messy empirical work.

About the intractability of Jeffrey's rule: sure, Bayesian updating is less demanding than Jeffrey's rule, which requires you to repeat Bayesian updating once for each cell in the partition. But it's also computationally intractable. If we really want a tractable rule, both Jeffrey and Bayes are out the window. And it's not hard to imagine an argument that the best tractable rules are "fallibilist": it's a waste of cognitive resources to have beliefs about neurons firing, so the best tractable rules will just shift around your beliefs about everyday matters in response to neurons firing rather than update directly on the fact that they fired. I'd bet that the optimal way to shift such beliefs around won't make one certain of anything.

Expand full comment

May 29

So first, yeah even strict Bayesian updating is famously np-hard, so if you think the brain is somehow Bayesian, you've got to think it's approximating Bayesian updates, rather than hitting them exactly. But there's a rich tradition of approximate Bayesian updating, so I think that's an attractive picture.

But second and even more handwavy, the picture I like is a kind of two stage one where decision and inference involves first somehow identifying small-world models to solve, but then solving them in broadly familiar, bayesian/decision theoretic ways. That's an alternative to a picture where we're always working with "grand world" problems.

With this picture, you can have plenty of certainties in your small world models, because you're not always using the same small world models. Eg, maybe you start out using a small world model where soneone isbtossing a fair coin. But if you're then asked to bet on whether it's double headed, you switch (how? Good question! Can't just be a Bayesian update) to a different small world model that makes room for that possibility.

Expand full comment

Kolmogorov's Ghost

May 15

This seems possible to accommodate in normal Bayesian frameworks by simply being very careful about the thing you're 100% certain of. In the election forecasting example, you're 100% certain that your database has a row that says "Pollster name=Some pollster, date = some date, result = some result". Of course, this doesn't mean you're 100% certain the pollster actually published this result on that date (maybe your scraping is buggy), and less certain than that that the pollster actually polled that many people and they answered as indicated in your result. It is unclear to me that this is not fully addressed by modeling the pollster as producing a number that is correlated but not exactly equal to the voting intentions of eligible voters.

Expand full comment

Pete Griffiths

Excellent explanation

Expand full comment

Bob Jacobs

Jun 1

I’m no longer a Bayesian, but there are ways to tackle the problem laid out in this post. For example, in my post “Solutions to problems with Bayesianism” ( https://bobjacobs.substack.com/p/solutions-to-problems-with-bayesianism ) I propose that we don’t use single numbers but use an interval instead (well, I actually propose something closer to a convex hull, but I’ll stick to interval in this explanation for the sake of simplicity).

So instead of P, we have a set K. Your prior is K = {P⁽ᶦ⁾} (apologies for the atrocious formatting, substack doesn’t work well with mathematical characters. Between the brackets is an i )

When you observe E, condition every member:

K_E = { P⁽ᶦ⁾(· | E) }.

Within each P⁽ᶦ⁾ we still have P⁽ᶦ⁾(E)=1, so Bayes’s formula stays intact, but with P⁽ᶦ⁾ instead: P⁽ᶦ⁾(H | E)=P⁽ᶦ⁾(H)·P⁽ᶦ⁾(E | H)/P⁽ᶦ⁾(E).

Your overall confidence in E is now an interval:

P̲(E)=inf_{P∈K_E} P(E) and P̄(E)=1.

Because P̲(E) can stay below 1, you are not forced into certainty, even though each precise model is.

Now, a drawback of this model is that it’s more computationally expensive, but since we’re doing epistemology and not software design, I don’t think this is much of an objection. (Let me know if you have trouble with the atrocious formatting)

Expand full comment

Jake Zuehl

Thanks for this public service!

Expand full comment

John Encaustum

Nice writeup! Personally, my favorite treatment of these issues I've seen so far is Wolfgang Spohn's in The Laws of Belief. I'm still digesting it, and I'm not convinced that the ranking functions are a proper fix for the issues I care about most, but it's been extremely thought provoking for me and has inspired significant further reading into tropical arithmetic and some topos theory.

Expand full comment

Some Lawyer

May 9Edited

Don’t we only have to only be 100% sure that a certain qualia was experienced to rely on Bayesianism? And is that such a terrible bullet to have to bite for fallibilism?

For example, let’s say I would like to apply my likelihood P(E|H), but I acknowledge I don’t have direct access to E (which is some evidence/fact that is unintermediated by polls, mental representations, etc.). Instead I have E’, which is a qualia I would have if I thought I *knew* I experienced E (but possibly also resulting if certain things happened that aren’t E). I also have the similar H’.

So I use P(E’|H’) and P(E’) to update my P(H’). My P(E) also updates (based on the likelihood that E’ occurred given E) as does my P(H) (on the same reasoning but for H’ and H).

With this method we can simply deny, e.g., we have access to P(E|E), and we instead only can use P(E’|E) to update E (which is very slightly less than 1, reflecting risks risks that E’ does not map to reality because of cognitive limitations, bad data, memory corruption, etc.).

We still have P(E’|E’)=1, but E’ is the sort of phenomenological experience to which we can attach total certainty while maintaining the essential components of fallibilism. After all, P(E’|E’) merely says we can be sure our thoughts and experiences exist, yet we remain fallible as to how they map onto the world (so P(E) can be below 1 even when P(E’) is at 1).

No one probably does this “mental states model” approach to applying Bayes, and yet they get good results. But I can simply acknowledge that in applying Bayes we can generally act as if we’re directly working with E and H instead of E’ and H’ as a simplifying convention. So this simplified approach does have P(E|E)=1, but it’s not what I really believe as a fallibilist. Therefore, my belief in fallibilism is consistent with my Bayesianism.

Expand full comment

David Riceman

Hartigan in the seventies did some work on having a set of priors. You might be able to get a similar effect by comparing a loss function over a deterministic guess and over a distribution. I'd guess you can find contexts in which there are distributions which are uniformly better. Then find a way to improve that, take a limit, and voila.

Expand full comment

Ali Afroz

May 12

I think the actual solution to your problem is that probability theory is an approximation we use because while we don’t have hundred percent certainty, we are certain enough to treat uncertainty as arounding error. For example, in election forecasting you might not be certain that you remember the results of a poll correctly if you just check it a second ago, but you’re certain enough to act as if it’s 100% certain for practical purposes without causing large enough errors to be a problem. After all, to be very honest, probability theory relies on assuming a bunch of statements with 100% certainty, like that all probabilities can’t add up to more than one. Now logical uncertainty even about basic and seemingly obvious mathematical statements is in fact, pretty plausible, especially since we have been wrong about what mathematical statements are definitely true in the past, just look at Euclidean geometry, but we probably can have enough confidence in basic axioms of probability theory to use 100% certainty as a rounding error. So even though we don’t actually have 100% certainty, we can use probability theory for practical updating in response to new information in real life.

Expand full comment

Phoenix

May 18

Your reason why Bayes formula necessitates infallibalism seems fishy. You write the formula as "Pnew(H)=P(H)*P(E|H)/P(E)." That then allows you to replace H with E, and obtain the resulting equation "Pnew(E)=P(E|E)" which you use to show that Pnew(E) is equal to 1.

However, I don't think you got Bayes formula correct. There's no such thing as "Pnew". The way I've always seen Bayes formula, and I double-checked on Wikipedia, is "P(H|E)=P(H)*P(E|H)/P(E)." The "new" probability of H just means the probability of H given E. With that established, if you replace H with E, you just get the identity "P(E|E)=P(E|E)." And that obviously proves nothing.

Expand full comment

May 18

The reason that sometimes you'll see the term on the left as P(H|E) and sometimes as P(H)new is that another way of expressing the whole idea is that when you learn E, your new probabilities for any hypothesis H should be your old probabilities, conditional on E. That is:

Pnew(H) = Pold(H|E)

Together with the definition of conditional probability, that gives you the rule as I expressed it. Another way of putting it is the Wikipedia definition is just relating the conditional probability P(H|E) to some other stuff. It's a further claim that your new probability once you learn E should be your old conditional probability P(H|E), but given that further claim, there's no difference between how I put things and how you saw them on wikipedia.

Expand full comment

Swen Werner

May 12

Greco’s post claims to expose a “dirty secret” in Bayesian reasoning, but ironically, it commits a fundamental misuse of Bayes’s rule. His Caribbean island example illegitimately invokes Bayesian updating in a context where no prior causal or epistemic connection exists between the predictor (the man in Times Square) and the outcome (the inheritance). Bayes’s rule only works meaningfully within a structured probabilistic model—his example is a textbook case of post hoc reasoning dressed up as formal inference. Far from revealing a flaw in Bayesianism, the post illustrates exactly what happens when you apply Bayes’s rule without understanding its assumptions. In this case, the article evidences a concerning lack of understanding of the very fundamentals of the subject it sets out to critique.

Expand full comment

May 12Edited

Your comment and some others here are making me realize I should've more explicitly addressed what I now see is a common misunderstanding. The idea that "Bayes's rule only works meaningfully within a structured probabilistic model", while defensible, is far from uncontroversial. Dissent from that idea is not "misunderstanding", but substantive disagreement. The people I'm calling Bayesians have a much more capacious view of the proper scope of Bayesian reasoning; they think that the sort of stuff Bayesian statisticians do when analyzing data is just a more explicit version of good informal empirical reasoning, as well as the kind of reasoning scientists do when they decide whether to accept a novel theory (where you'll never have the kind of background frequency information you'd really want if you were going to build a convincing statistical model). And they think that we can distinguish better and worse informal empirical reasoning by thinking of it through the lens of Bayes's rule. I linked to Sean Carroll to make clear that the view isn't a straw man, but I could've easily found quotes from Bay Area rationalists, or to take an example less familiar to substackers, but influential in the philosophy of science, Howson and Urbach's classic tome "Scientific Reasoning: The Bayesian Approach."

Expand full comment