I’ve agreed to review Timothy Williamson’s Overfitting and Heuristics in Philosophy for Notre Dame Philosophical Reviews. This post is not that review. But I will use it as a chance to talk about some themes from the book that may be of interest to a broader Substack audience that doesn’t (I hope!) exclusively consist of academic philosophers. Still, this post will be a bit more inside baseball than my standard fare. You’ve been warned!
We’ll ultimately get to the idea that there is a single hair—#32,207, perhaps?—that marks the boundary between being bald and not being bald. But getting there will require going via some general questions about the relationship between science and philosophy, and what the latter has to learn from the former.
A recurring idea in much recent philosophy is that there is no clean break between philosophy and other sorts of inquiry. Maybe Quine was the first person to explicitly defend the idea of a continuity between philosophy and science. He did so on the basis of controversial arguments, most centrally the rejection of the distinction between the a priori and the a posteriori. But lots of other philosophers who don’t accept Quine’s arguments still agree that coming up with a demarcation criterion cleanly separating science from philosophy is hopeless.
Go back a few hundred years, and while you’ll have a hard time finding anybody explicitly defending Quine’s position, that’s only because they didn’t conceive of “science” as its own endeavor distinct from philosophy. Rather, they talked about “natural philosophy”, which looks in retrospect like what we’d call science today, but with plenty of what we’d call philosophy mixed in. Today we think of Galileo and Newton as scientists, and Descartes and Leibniz as philosophers. But in their own times no such distinction was recognized; all of them would have been understood as doing natural philosophy.
While this position can be defended by abstract, principled arguments, I think the most powerful case is made by example. A few years ago James Madole and Katherine Paige Harden published a paper called “Building causal knowledge in behavior genetics”, which addressed the question of how we can identify genetic causes of human behavior. A major aspect of the discussion, however, turns on questions about causation in general, and they engage deeply with the recent philosophical literature on the topic (e.g., the work of David Lewis.) If you’re interested in whether and in what sense genes cause behavior, it seems to me you can neither ignore what look like metaphysical questions about the nature of causation, nor can you stay in your armchair and avoid engaging with data. Or to take another example, lots of salient questions about contemporary artificial intelligence—does it genuinely understand? Does it reason? If not yet, what would it take? What evidence that we currently lack ought to convince us?—can only be addressed at the intersection of computer science, cognitive science, and philosophy. Here’s a paper from a computer science conference citing and discussing, among other philosophical works, Michael Dummet’s book on Gottlob Frege’s philosophy of language.1
The idea that the boundaries between philosophy and science can be blurry should be relatively uncontroversial. But it can be a jumping off point for wildly different arguments. Here’s one direction it can be taken. Philosophers who think it’s regrettable that contemporary philosophy is largely an armchair endeavor can appeal to the blurriness of the distinction between philosophy and science, and the history of philosophers asking what we now think of as scientific questions, to argue that there’s no principled reason to reject the idea that philosophy ought to be much more interdisciplinary and empirical than it currently is; in particular, the idea that if philosophers get their hands dirty engaging too much with a posteriori questions they’re no longer doing philosophy is one that’s hard to square with much of what we tend to regard as great historical work.2
That is very much not the direction Timothy Williamson has taken the idea. Rather, he’s leveraged the difficulty of drawing crisp distinctions between philosophical and scientific methods as a way of defending armchair, non-empirical philosophy against the charge that it commits some methodological mortal sin. This was one of the main themes of his first book on philosophical method, The Philosophy of Philosophy, where he argued against the idea that there is some distinctively disreputable method philosophers use called “appeal to intuition”, which isn’t used in the sciences, and which we can and should do without. Instead, he contends that while it’s true that philosophers must make use of their judgment to identify suitable premises for their arguments, the use of judgment is an ineliminable part of inquiry everywhere, including in science. Arguments that attempt to cast doubt on a supposedly peculiar use of “philosophical intuition” end up collapsing into general skeptical arguments that would undermine all knowledge—and which should therefore be rejected—rather than targeted attacks on philosophical methods.
The more recent book I’m reviewing makes use of the blurry distinction between philosophy and science in a different way. Here, Williamson’s targets are not bomb-throwing critics of philosophy as a whole, but rather a motley assortment of philosophical views he thinks are the product of the methodological sin of overfitting.
Overfitting
Imagine a biologist who wants to predict a plant’s growth rate from sunlight, water, and soil nutrients. She records 100 data points and feeds them into a computer. The software can generate anything from a neat linear model with just a few parameters to a 100‑parameter monstrosity that perfectly fits every noisy measurement. The latter looks spectacular on the training data—but give it a fresh plant and it does a poor job making predictions. It has learned the random wiggles, not the underlying patterns.
Modern statistics and machine learning have a whole toolbox for avoiding that trap: train/test splits, cross‑validation, complexity penalties, Akaike and Bayesian Information Criteria—all formal ways of rewarding models that explain a lot with a little, and punishing those that “explain” data only by memorizing it.3 Williamson’s thought is that even in less formal philosophical contexts, where the “data” to be explained are judgments about things like knowledge, meaning, morality, or freedom, we don’t really explain them if we try to capture them with theories that are sufficiently flexible that they’d be able to accommodate any judgments. Just like scientists, we should prefer theories that are rigid enough to be potentially inconsistent with the evidence they’re meant to explain.4
Here’s a case study at the intersection of philosophy and economics. In those disciplines, a standard way of modeling an individual agent’s beliefs is with a set Ω of live possibilities. An agent believes some proposition A when A is true at every possibility in Ω, the agent disbelieves A when A is false at every possibility in Ω, and otherwise—when A is true at some possibilities, and false at others—the agent is uncertain. We represent finer-grained distinctions in her opinions—facts about which propositions the agent is more or less confident in, among the ones she’s uncertain of—with a probability distribution over the subsets of Ω.5 Two surprising features fall right out of this framework:
Closure under logical consequence. If the agent believes A, and all the A-possibilities in Ω are B-possibilities, she automatically believes B.
Indistinguishability of equivalent sentences. If two sentences are true in the same possibilities in Ω, the framework cannot represent the agent as believing one while failing to believe the other.
These features are sometimes called “logical omniscience”, and the fact that standard models of belief represent agents as logically omniscient is often thought to be a problem; isn’t it possible to believe P, where P entails Q, but to fail to believe Q? While that’s certainly a mistake, we should have a way of representing that sort of mistake, shouldn’t we?
It’s not so obvious. Let’s work through a simple example. Suppose a fair six‑sided die is rolled behind a screen. Initially our agent’s set of live possibilities is:
One possibility for each face the die might have landed on, all at probability 1/6. Then, she’s told that the die landed on a prime-numbered face. So her new set of live possibilities is:
Assuming she updates her probabilistic opinions in the standard way, her new probability for these possibilities is 1/3 for each of them. A nice thing about this framework is that it meshes neatly with expected utility theory; we can make predictions and/or recommendations about which bets she’ll accept before and after learning that the coin landed on a prime-numbered face. Moreover, these further claims about which bets maximize expected utility are not ad hoc additions—rather, they’re what inevitably follow given our reasonably natural starting assumptions.
But what if we want to represent a logically ignorant agent who isn’t sure which numbers between 1 and 6 are prime? Can we represent her as learning with certainty that the die landed on a prime-numbered face, but not learning with certainty that it didn’t land on 4? The short answer is that the most straightforward way of doing that is by using a framework for representing belief that is so flexible that everything—in particular, everything about how our agent’s beliefs will change in response to learning new information—has to be put in by hand. E.g., We can use a bigger set that doesn’t just include 6 possibilities, but instead includes 12—for each of the original possibilities, we can include a possibility where that number is prime, and another one where it isn’t.6 By brute force we’ve added in a possibility where 4 is prime, and allowed the agent to leave open that possibility when she learns that the die landed on a prime-numbered face. But we’ve done so at the cost of robbing our toy model of any predictive or explanatory power when it comes to the agent’s actions, or belief change over time; learning that the die landed on a prime number no longer has any automatic implications concerning what number—1, 2, 3, 4, 5, or 6?—it landed on.
How will the agent’s willingness to accept a bet to the effect that the die landed on 4 change in response to learning that the die landed on a prime number? In the standard model, with logical omniscience built in, we can get some non-trivial, surprising answers to questions like this. Granted, our example was simple enough that the answers aren’t that surprising. But in more complex cases, the framework can generate counterintuitive, but potentially illuminating results.
However, when we allow agents to be logically ignorant, we effectively rob ourselves of the ability to plug in some assumptions about an agent’s initial beliefs and values, and then see what they entail; once agents can be logically ignorant, any initial assumptions you start with will be compatible with any conclusions you like about how they’ll update their beliefs in response to new information, or what bets they’ll be willing to take.
The moral is that a noble aspiration—to accommodate the data that humans sometimes make logical mistakes—leads philosophers to a non-explanatory, “overfitted” theory of belief.7 Better to work with a theory that doesn’t accommodate all the data, if that’s the cost of explaining some of them.
Bullet Biters and Particularists
There’s a recurring divide in philosophy—sometimes explicit, sometimes just under the surface—between people who put more weight on abstract, general principles, and people who put more weight on judgments about specific cases. In a gem of a paper, Roderick Chisholm called the latter group, in which he placed himself, “particularists”, and the former he called “methodists”, largely as a joke. I’ll call them “bullet biters”. The bullet biter wants to find some simple, general theory—something clean and principled—and then follow where it leads, even if that means going to some strange places.8 The etymology, as far as I understand it, comes from the practice of soldiers being given a bullet to bite during amputations, when anesthesia was unavailable. Biting a bullet is a way of bracing for pain. The particularist, by contrast, starts with their judgments about individual cases, and is wary of any theory that strays too far from them.9 Particularists, in short, avoid (philosophical) pain.
The contrast is very clear in ethics. A utilitarian, for example, starts from a simple principle—maximize happiness—and lets that principle decide every case.10 If that means lying, breaking promises, or pushing one person in front of a trolley to save five, so be it. These are the bullets that utilitarians famously bite.11 Principled Kantians are bullet biters too. I’ve heard it claimed that the most influential contemporary Kantian, Christine Korsgaard, thinks its immoral to throw people surprise parties, because of the deception involved. At the other extreme, we have the moral particularist, who holds that “there are no defensible moral principles, that moral thought does not consist in the application of moral principles to cases, and that the morally perfect person should not be conceived as the person of principle.” The moral particularist doesn’t need to bite any bullets—if she judges that it’s wrong to push one person in front of a trolley to save five, her theory lets her do so—but she also can’t explain or justify her judgments about particular cases by appeal to general principles, because she doesn’t endorse any; according to the particularist, ethics is a matter of good judgment, and good judgment can only be imperfectly captured by general principles.
But it’s not just in ethics that we see this sort of divide. Some metaphysicians want a simple ontology that obeys a few tidy laws, even if it leads to surprising or revisionary consequences (e.g. nothing exists other than elementary particles). Others are more pluralistic, building complicated taxonomies to better accommodate our pre-theoretical beliefs about what sorts of things exist, and what sorts of things do not. It’s natural to see physicalists about consciousness—people who think that facts about conscious experience are ultimately nothing over and above facts about brain function, even if it’s admittedly hard to see how to explain consciousness in physical terms—as bullet biters. By contrast, people who insist that consciousness is too weird and distinctive to be explained in reductive physical terms are, on that debate at least, closer to the particularist end of the spectrum.
Seen through this lens, Williamson is a paradigmatic bullet biter. The book that made his name in philosophy was a defense of the idea that vague terms like “bald” or “tall” have precise boundaries, though we can’t know exactly where those boundaries are. This position—called epistemicism—entails that there’s a specific number of hairs that marks the dividing line between being bald and not. Most people find that hard to believe, to put it mildly. But he does a much better job than I at least would have thought possible of arguing that all the alternatives on offer are just as incredible. Very roughly, he argues that his view allows him to preserve the power and simplicity of classical logic, where every statement—including statements to the effect that someone with 32,207 hairs is bald—is either true or false. It’s very hard to avoid the consequence that there is some smallest number of hairs that suffices for not being bald without revising classical logic, or holding that logic is inapplicable to natural language. Both options involve serious costs.
In the recent book, various positions Williamson has argued for elsewhere, and which I won’t explain here—not just epistemicism about vagueness, but also the view that the natural language indicative conditional is the material conditional, as well as the view that formal semantics should be done in an intentionalist, rather than hyperintentionalist framework—are all defended by arguing that accommodating the data that counts against them would amount to overfitting. Better, he says—though not in this language—to bite the various bullets that are involved in holding the views just mentioned.
The military metaphor of bullet-biting can make it seem like it’s motivated by a kind of macho bravado: do you want to be a weak, wilting particularist? Are you afraid of a little (philosophical) pain? As I read Overfitting and Heuristics in Philosophy, it’s a defense of bullet biting that aims to put the practice on sounder footing than the previous would suggest. The reason to bite bullets isn’t that it’s fearless or badass. Rather, a willingness to bite bullets is the price of admission to a certain kind of philosophical-cum-scientific theorizing. If we want theories that do more than trace the shape of our pre-theoretical judgments—if we want theories that explain—then we need accept the kinds of constraints that sometimes force us to bite bullets.
If “Michael Dummet” and “Gottlob Frege” are not familiar names to you, that’s kind of my point! They’re both very much philosophers’ philosophers, but it still makes sense to engage with their work in thinking about whether LLMs understand language.
I’m gesturing here towards what’s sometimes called the “negative program” in experimental philosophy.
It’s common to point out that contemporary AI involves building models with astronomical numbers of parameters, and which don’t seem to suffer from overfitting. (See, e.g., here.) So should we give up on the idea that there’s anything wrong with overfitting, whether in science or philosophy? Williamson addresses the concern. Very roughly, his response is that even if modern AI can make successful predictions while using tons of parameters, in philosophy (and some parts of science) we’re usually more interested in understanding—which you can’t get from black box models with trillions of parameters—than prediction.
Shades of Popper here.
Not necessarily all subsets, I know, don’t @ me about non-measurable sets: I’m trying to keep this accessible.
How should we interpret these possibilities where 4 is prime? Good question. Also, this certainly isn’t the *only* way of trying to represent a logically ignorant agent. But in my view, the awkwardness here is illustrative of the awkwardness of alternative methods. Painting with a broad brush, attempts to model logically ignorant agents tend to face a dilemma. Either you build in enough structure that sparse assumptions about the agent’s initial knowledge will have rich consequences for things like learning and choice, in which case you’re implicitly treating the agent as knowing a lot of logic, and so not really doing what you set out to do, or instead they won’t, in which case you end up with a model with no explanatory power: facts about learning and/or choice have to be added in manually one-by-one, rather than derived from sparse assumptions.
How should we accommodate the data that humans sometimes make logical mistakes? Here I differ from Williamson, but that’s a topic for another day. If you’re interested, I discuss it here.
I leave open the intriguing suggestion from Flo Bacus that bullet-biting is always only provisional; at the end of inquiry, everything makes sense, and there is no (philosophical) pain.
Perhaps this is too obvious to mention, but it seems to me this distinction is more like a spectrum than a binary.
I’m aware this is a bit of a caricature. One way of arguing for utilitarianism is starting with some simple cases—e.g., Singer’s famous drowning child case—and putting a lot of weight on them, and holding that they only way to do justice to them is by adopting utilitarian general principles.
Which is not to say that they have nothing to say to make them go down easier. They absolutely do!
I loved this article. Elegant and full of great examples/explanations.
About science and particularity: yesterday I was watching Fire of Love, a documentary about a volcanologist couple that visited and filmed well over 100 live volcanoes. At one point Maurice (who did the press tours) is asked about the science of volcanology. And he reports his frustration with the way that his colleagues classify volcanos, insisting that each volcano has its own “personality” and should be studied in itself.
I think some philosophers have this attitude. They are after some other epistemic good besides “explanation” in the form of subsumption to a simplified pattern. They are more like the volcanologists, in love with each volcano, than theorists, trying to find the deep patterns common (or at least common enough) to them all.
I really liked this article. It looks like you present the outline of a solution to the paradox of inference, which IMO is a problem well worth solving.
Plus, Quine, Dummett and Frege references. Yay. Don't think I have seen any of these on Substack before now.
BTW, just in case you wonder, no, I am not a relation of Timothy Williamson.