I’ve agreed to review Timothy Williamson’s Overfitting and Heuristics in Philosophy for Notre Dame Philosophical Reviews. This post is not that review. But I will use it as a chance to talk about some themes from the book that may be of interest to a broader Substack audience that doesn’t (I hope!) exclusively consist of academic philosophers. Still, this post will be a bit more inside baseball than my standard fare. You’ve been warned!
I loved this article. Elegant and full of great examples/explanations.
About science and particularity: yesterday I was watching Fire of Love, a documentary about a volcanologist couple that visited and filmed well over 100 live volcanoes. At one point Maurice (who did the press tours) is asked about the science of volcanology. And he reports his frustration with the way that his colleagues classify volcanos, insisting that each volcano has its own “personality” and should be studied in itself.
I think some philosophers have this attitude. They are after some other epistemic good besides “explanation” in the form of subsumption to a simplified pattern. They are more like the volcanologists, in love with each volcano, than theorists, trying to find the deep patterns common (or at least common enough) to them all.
I feel like this is a common feature of scholarship in the humanities - it’s all about the particularities of each thing, with a resistance to any sort of universalizing message.
A quick follow up. On second thought, I wonder if some “volcanological” particularists actually see themselves as doing theory-building, just in a more granular way. Then the overfitting charge undoubtedly sticks.
I do take seriously the fact that overfitting seems not to be a problem for prediction in contemporary machine learning with huge datasets. In some domains, you might think we're just not going to get the kind of understanding that subsumption under simple, general patterns provides: they're too complex. Maybe the best we can do in such domains is train the giant neural networks that are our brains to make good judgements using tons of data, even if we can't then articulate how we make our judgements using general principles. Maybe that's how particularists understand ethics.
Overfitting absolutely is a problem in ML, even with a huge dataset.
If your dataset is big enough, the model cannot overfit on all of it because the complexity required exceeds the size of the model. But it can, and will, overfit on parts of the dataset. That's why LLMs will sometimes regurgitate chunks of their training data verbatim. Moreover, if the dataset itself contains certain biases, those biases will be reproduced in the model.
"Seems not to be a problem" is certainly too strong. Maybe I should've said something like: "you'd think overfitting would be a devastating problem for models with trillions of parameters, and it turns out it's not--performance of trillion-parameter models on novel data is often better than performance of billion-parameter models--as long as you're training on large enough datasets that even trillions of parameters aren't enough to fit them entirely by memorization." Does that sound better?
Here, it’s definitely weird but the memorization threshold is not a crucial issue. There are some proofs you can get effective learning even with many more parameters than data points, as long as the regularization is appropriate. Regularization is what prevents memorization in the high-parameter regimes.
The simplest case of regularization, “Tikhonov” L2 regularization, implicitly “multiplies your data points infinitely by adding white noise to each of them.” (It doesn’t explicitly generate the noisy data, but the algorithm result is formally identical to the result of the process that does generate the infinite data.) The algorithm can’t memorize infinite noisy data, and it doesn’t memorize the data points without noise since it “forgets which ones are the originals.”
Yes, I think the nontechnical messaging around the technical issue of “double descent” is pervasively confusing. Some people talk about it as if it means the models stop overfitting, but practitioners are more aware that there can still be application-fatal overfitting. A lot hinges on whether the model was regularized “properly”, which amounts to baking in intuitive heuristics like Occam (for L1) or preferences for model robustness (for dropout)… but relating the intuitions to exact application is tricky and those conversations are hard to make clear.
Just a (non-earth) scientist here — how much of this is due to the fact that volcanoes are pretty idiosyncratic objects of study; there aren’t that many of them, no two volcanoes can occupy exactly the same space (relative to ocean, tectonic plates, etc), and so there really isn’t such a thing as a “standard-issue volcano” and developing a Grand Unified Theory of Volcanoes just isn’t the most feasible endeavor?
Compare of course to something like biology where even within some specific sub-population of interest, there are many carbon copy-esque specimens one can study.
I really liked this article. It looks like you present the outline of a solution to the paradox of inference, which IMO is a problem well worth solving.
Plus, Quine, Dummett and Frege references. Yay. Don't think I have seen any of these on Substack before now.
BTW, just in case you wonder, no, I am not a relation of Timothy Williamson.
At least three of my current papers can be summarized as “Williamson is wrong” (about evidence being a proposition, about how far we are from cognitive homes, and about the relationship between knowing how and knowing that). In each case, I think he gives a nice and elegant theory, but most of them just aren’t that useful, and I think that’s the core of this issue. An infinitely flexible account that lets you accommodate everything doesn’t predict anything - but a highly rigid theory that says there is a precise fact of the matter, but it’s just one that we are incapable of knowing anything useful about, is just as useless.
I love this point. In the course of writing my actual review, I find myself trying to voice the following frustration. He's often willing to dismiss this or that phenomenon as noise, which he'll say we could only accommodate with an overly inelegant, overfitted theory. For instance, the difference in cognitive significance between "Clark Kent is Clark Kent" and "Clark Kent is Superman." By his lights, both sentences mean the same thing, both beliefs have the same content, and any differences in the roles they play in our cognitive lives are irrelevant to formal semantics, formal theories of belief, etc.
I find myself wishing he would be more explicit about what it is he thinks formal semantics, formal theories of belief, etc., are supposed to *do*, if explaining those differences isn't a big part of the story.
>"what it is he thinks formal semantics, formal theories of belief, etc., are supposed to *do*"
I think Williamson's metaphilosophy would reject that question as inappropriate. We're only supposed to ask whether the claims of (e.g.) a formal semantic theory are True or False. The pragmatic question of why we care about the theory in the first place is deemed irrelevant to the question of its truth or falsity.
From TPOP (2nd edition, p. 540): "the motivation for asking [a philosophical question] should not be confused with the question’s content."
Tried to reply already but not seeing it, so apologies if this ends up as a double post.
While I see your point, I don't see how to square it with what he says about model building in philosophy. It's a pretty standard line, which I think he accepts, that models aren't supposed to be literally true, but are supposed to illuminate some important patterns/phenomena, especially the sorts of models you see in formal philosophy (as opposed to giant models in meteorology, or the ones the fed uses, which really are supposed to be as accurate as possible).
Once you think of what you're doing as model building, I don't see how you can avoid letting some amount of pragmatism in; you need to distinguish the underlying phenomena you really care about illuminating from the stuff you're willing to regard as noise.
Yes — if you think of model-building as inescapably pragmatic, then assessing the success of a model requires an account of the model's intended purpose; it becomes difficult to reconcile the claim that "philosophy is about model-building" with the methodological constraint that a truth-conditional semantics for philosophical language should be definable independently of pragmatic concerns.
So Williamson must believe he has a conception of "model-building" that isn't inescapably pragmatic. And it would be nice if he would let us know what it is!
The “norms of assertion” stuff is the biggest example of this to me. What is a “norm of assertion” supposed to be if it’s not the full Gricean thing with four out more different categories of features that are differently relevant for different conversations depending on their purposes?
I don’t accept your analogy between scientific and philosophical theorizing, and I doubt that overfitting is even a problem in philosophy. You suggested that we sometimes should bite bullets to increase the predictive power of our theories (or to avoid merely accommodating pre-theoretical judgments). I accept this in the case of scientific theorizing but not in the case of philosophical theorizing. The difference with philosophy is that many philosophical theories are not empirically testable. Often, the only thing we have to rely on in philosophy is our judgments.
Overfitting is bad in science because it leads to poor outcomes. If we overfit a model to the growth rates of particular plants, then it will poorly predict the growth rates of other plants. Crucially, scientists get feedback from the world. If their models are overfitted, then they do not work well. If their models are fitted well, the models lead to successful empirical predictions.
As I understand Williamson, he’s saying that there’s nothing special about philosophical methodology. Just as scientists rely on their judgments, philosophers must rely on their judgments. Presumably, then, overfitting in philosophy is bad for the same reason that it is bad in science: overfitted philosophical theories yield the wrong predictions. The problem is that philosophical theories’ predictions usually don’t have any practical consequences. Whether epistemicism is true, whether numbers exist, whether there are concrete possible worlds, I could not possibly look at the world to determine these things.
Philosophical predictions often do not consist in empirical predictions. They consist in predictions about what further judgments I should make. Pretty much the only way to “test” a philosophical theory is by comparing it against other philosophical judgments I have. This suggests that philosophers only have their judgments, whereas scientists have judgments and empirical evidence. If I am right, then I don’t see a problem with fitting our philosophical theories to our judgments.
I think this a very reasonable line of question, and I see it as in the same ballpark as the topic Andrew Sepielli mentions he's going to cover in his seminar above. In science you can learn you're overfitting by testing a models performance on new data and seeing that it does poorly. In philosophy you generally don't have opportunities for those sorts of clean tests.
I'm not sure if this is just a temperamental or aesthetic judgment--I hope not--but it still seems to me that it's worth aiming for explanation in philosophy, and that requires seeing overfitting as a bad thing. Here's an example I like. It's a consequence of the standard Bayesian formalism for representing belief and belief change, along with some (contestable!) assumptions about the structure of evidence, that absence of evidence is (perhaps small) evidence of absence: if learning E provides evidence for H, then failing to learn E provides evidence against H.
That's counterintuitive--it's a familiar slogan that absence of evidence is not evidence of absence! By my lights, once you see how it follows from some pretty natural assumptions, the reasonable thing to do is to discard your initial judgment that absence of evidence is not necessarily evidence of absence, and to regard yourself as in possession of an explanation as to why it must be. I admit there's no knock down empirical case for this. You can reject the assumptions about how evidence should be formalized that lead to the result. But that seems to me to leave you in a worse place, from the standpoint of the sort of systematic understanding that I take it philosophers tend to crave. But maybe not all philosophers! Maybe if my sensibilities were more particularist, I'd be less sympathetic to the idea that there are any general formal principles about evidence that could lead to surprising consequences like the one just mentioned.
Extremely clear and interesting post. This may sound out there, but I was reminded of a paper by Erik Hoel on dreams and the overfitting brain hypothesis. Hoel argues that the hallucinatory nature of dreams may serve to rescue the generalizability of experience against overfitting, and he draws on work on deep learning to make his point. This was back in 2021, and I wonder how he would construe current-day LLM hallucinations. If he’s right, it may be that hallucinations vs. overfitting are the tradeoff and models pay that price as a byproduct of their attempts to generalize despite having trillions of parameters. Or something to that effect. And the interesting analogy would be: biting bullets is a bit like hallucinating: it will make sense eventually.
So I saw this comment previously and wanted to meaningfully circle back, but…who am I kidding; I’m on here to vegetate and not have more papers to take a serious look at, lol.
That said, the guy is a biologist and almost necessarily predisposed to be interested in the ‘neural correlates’ of ANNs if he’s going to talk about AI. Rich history of tortured analogies there, though I certainly admire the dedication of Hinton et al.
My working quibble is that hallucinations (evocative term notwithstanding) and questions of generalization are, while extremely nontrivial, actually quite mundane. Eg, knowing how to drive in Boston after having learned in SF. Not inventing random streets in one’s mental map of Chicago. Dreams can be very mundane, too (eg, imaginary normal conversations w/coworkers), but the author seems all too eager to dismiss dreams w/”high emotional valence” — their frequency, their neurobiological significance, etc. I guess it’s fine if he wants to limit his analysis to the sorts of boring dreams we have about work, but it’s not like Dadaist nightmares about fire-breathing dragons and the end of the world don’t exist for most people. If he doesn’t want to touch anything Freudian-tinged, he needs to specify that he is working with a very limited subset/definition of “dreams” — the mundane unemotional sort — bc for the purposes of his analogy w/emotionless machines, that’s the only sort that works.
When has one bitten a bullet? What’s the literal content of the pain metaphor? Is it just denying something that seems true? So any solution to a philosophical paradox must bite a bullet? I’m trying to figure out what bullet epistemicists have to bite
Tangentially related to this: ‘Precise’ in ‘The book that made his name in philosophy was a defense of the idea that vague terms like “bald” or “tall” have precise boundaries, though we can’t know exactly where those boundaries are’ isn’t right, since Williamson isn’t denying that the boundaries are vague and ‘precise’ is usually just used as an antonym of ‘vague’. ‘Sharp’ might be better.
But in any case, that (e.g.) there is some specific hair such that one is bald iff one has that many hairs or fewer is as much a consequence of supervaluationism and other classical views as it is of epistemicism.
I tend to disagree with Williamson about what "precise" means in natural language. So while I'd stick with his terminology in a philosophy paper, for the sake of glossing his view for a general audience I don't feel obligated to defer to his usage.
And on the idea that other views have to bite some of the same bullets as epistemicism, I absolutely agree! That's part of what I mean by saying he makes a powerful case in the book that it's very hard to give a theory of vague language that avoids commitment to comparably bizarre sounding claims--sometimes the very same ones, sometimes others--to the ones the epistemicist asserts.
I think a tension in views which are physicalist-naturalist and bullet biting is that it seems very hard to fit the foundational principles that the physicalist-naturalist use in their bullet biting into a physicalist-naturalist worldview.
I think this tension comes from the fact that the natural sciences, which the physicalist-naturalist argues should be the (sole?) guide to the fundamental state of the world, rely centrally on methods which can't obviously be fit into at least the physicalist worldview, like Bayesian probabilities, causation and mathematics.
There's a second problem for the naturalist that it's unclear how natural sciences can discover the correct methods for doing natural science. I'm much more sympathetic to naturalist bullet biting here, though, than I am of the physicalist position.
Everyone will have to bite a pretty unpleasant bullet when building the foundations of how to acquire knowledge, and circularity isn't obviously much worse than any of the others. It seems like the physicalist, though, doesn't have to be committed to a physicalist ontology, and so commitment to this view looks much more like scientism to me than naturalism does (synthetic a priori knowledge is also a pretty unpleasant bullet to bite!)
I think the naturalist view does fall apart, though, if one wants to use unjustified axioms to ultimately ground one's view. I think in the worst case this is just straightforwardly inconsistent while also being a quite common view amongst lots of people with exposure to the natural sciences (I suspect that lots of LessWrong style rationalists implicitly hold views like this.) Part of the motivation for writing this is also reflecting on my own views. Before getting exposure to the philosophy of science I certainly held foundationalist views about epistemology, and I was certainly sympathetic to the argument that we should strongly defer to findings of natural sciences (and I still think that often we should!)
Hello! Just some loose thoughts — forgive me if insufficiently informed — about what you write here.
- My understanding is that the explanatory and predictive power of highly idealised models in economics is a matter of considerable controversy. Maybe they therefore don’t inform the best case for bullet biting. (Obviously, these models are also controversial in philosophy.)
- With respect to the toy example, suppose we want to predict the betting behaviour of people with cognitive problems, who may not remember which numbers are prime, and who might struggle to apply the definition to cases. Then perhaps logical omniscience is an inappropriate idealisation. You would need to fill in the gaps, to be sure, but would that necessarily amount of overfitting? Or consider explanations of famous reasoning mistakes, like Wason’s selection task. Whether or not we can explain these adequately with a possible worlds model of belief, it would surely be unfair to lump together alternative models with tortured polynomials.
- Sometimes in epistemology it’s helpful to idealise because you want to abstract away from memory and other limitations. That strikes me as especially appropriate because epistemology is, I think, normative: it’s about good reasoning and things like that. If we want to study how knowledge is transferred by testimony, for instance, it makes sense to consider ideal agents, so that knowable transfer is not affected by irrelevant factors. I know that we can sometimes learn about actual reasoning by seeing it as an approximation of normatively ideal reasoning. I am also aware that idealisations can be appropriate even when not modelling normative ideals, but rather modelling actual phenomena directly, as it were. Nonetheless, it seems to me that there is a large space of good science that lies between accepting highly idealised models of belief, and a dithering, unproductive particularism.
- I like an analogy from computer science. You might want to predict how quickly your computer will run a programme as a function of input size. Up to a point you can get by with a fairly abstract RAM model, and there’s nothing wrong with that for some explanatory purposes. (Why does this sorting algo take much longer than this one?) But if you want a highly accurate prediction, or more detailed explanation, you will need to particularise your model, taking into account all the fancy workings under the hood. Even very particular models of specific architectures are powerful and explanatory, as any good systems textbook can attest to. (Of course they too idealise away from all the physics.)
- I guess I sometimes worry that this overfitting rhetoric could degenerate into just that: rhetoric. It would be good to see specific and uncontroversial examples being discussed in detail. (Oh no detail!) I find the metaphor of overfitting a curve to data to be quite unhelpful when adjudicating between, say, different models of computation, or different models of belief or reasoning.
- It seems to me that different extents of idealisation are appropriate for different purposes. Unless one has a reason for systematically privileging the most idealised models as guides to the metaphysics, accepting highly idealised models does not need to come with any bullet biting. (Is my thinking.)
Not sure how interesting or controversial any of that is! But thanks for your post; this stuff has been on my mind lately.
I like the example of the Wason task. I agree it's a good phenomenon to take as an explanandum. My sense is that no extant non-logically-omniscient theories of belief do a good job with it. Of course, you can take an impossible worlds theory of belief and add in by hand all the stuff you'd need--make sure the agent knows that it follows from "you need to be 21 years old to drink" that you need to check both the drinkers, and the <21 year olds to ensure compliance with the rule, but let the agent fail to know that it follows from "odd-numbered cards have a vowel on their obverse side" that you need to check both the odd-numbered cards, and the cards with consonants. But it doesn't fall out of the theory; you could just as easily have predicted the opposite result.
Here's a different example. Take something like prospect theory. It's explicitly meant to be a kind of de-idealized theory of decision-making that can make sense of some phenomena that traditional expected utility theory can't. But it's still constrained enough that it can't make sense of just *any* behavior. I'm (at least tentatively) a fan! My sense--happy to be corrected--is that it's still about the best we've got for making sense of various behavioral phenomena (e.g., loss aversion). I don't know of any comparably illuminating theories of logically non-omniscient belief. That is, theories where there's some robust empirical phenomenon (like loss aversion) where it falls out as a reasonably natural, non-ad-hoc prediction of the theory, rather than something that you can get if you set the parameters just right, but where you could have just as easily and naturally have gotten the opposite prediction if you'd set the parameters differently. I'm of the view that there's some principled reason why there couldn't be such theories, nor am I opposed to the search for them. But I do think conceiving of the constraints in the sorts of terms I've been discussing--ie, you should want a theory where the kinds of logical mistakes we make come out as something predicted by the theory, rather than merely something that can be described in the language of the theory--is fruitful.
Thanks for your response! I think that your framing is very illuminating. There does seem to be a major difference between expecting logical mistakes to fall out of the theory, vs being merely expressible in it.
I still wonder whether we really have to choose here. Again I think that models of computation offer an illuminating analogy. A general model of computation might begin with Turing Machines or RAMs are whatever else. Likewise, a general model of belief might begin with possible worlds. These can offer insights about concrete computers and believers, respectively. But at least in the case of computers, there comes a point where it is unreasonable to expect such models to be of much use. They are too idealised to explain (e.g.) why the same C code is slower on this machine than on that one, to the specific extent that it tends to be. At some point, good scientific practice models the finer grain of computer architectures and algorithms. Likewise, I would expect that a mature science of belief will eventually incorporate rich specifications: that much of why and how we reason is best explained in a model that, yes, could have been tweaked any which way. I don’t think this is bad science. As in the case of computers, scientific theorising can head in that direction when prediction and explanation become hungrier for precision and specificity.
The example you give of prospect theory is a helpful tip! I don’t have the expertise to assess the merits of that programme. I see no reason to depart from your tentative endorsement of it. But I am sceptical that it’s the whole story. I think I stand on reasonably firm ground when I say that, contra for instance vision science, the science of belief remains a relatively immature branch of psychology. Actually vision science is an independently interesting case, because logical omniscience is seemingly not appropriate there, at least not when we expect our model to be metaphysically telling. Many truths are not perceptually representable at all, so we cannot say that perception is closed under logical consequence, or whatever notion of consequence attaches to perception.
I am not a big fan of impossible worlds conceptions of propositional attitudes. I am drawn to neo-Fregean proposals. I have no idea how much these accounts can recover of the explanations that you prefer, nor how easily.
I like the case of computing, but I think it's instructive in part because it makes it easier to describe some important *differences* between folk psychological descriptions of humans on the one hand, and program or algorithm-level descriptions of computers on the other. Basically, I think program-level descriptions of what computers are doing--even though they involve abstracting away from lower level implementation detail--still "carve at the joints" much much more closely than do folk psychological explanations of human thought and behavior. Computers are intelligently designed by humans so that they perfectly implement abstract, program-level descriptions. Humans were designed by evolution, and evolution wasn't trying to make sure that the operations of the brain could be really neatly characterized in terms of propositional attitudes. My read of the LOT tradition in psychology/phil mind is that it takes the opposite view; as we get better at psychology, we're going to find that categories pretty close to our pre-theoretical common sense categories for categorizing the mind (categories like "belief") also end up as fruitful categories to use in giving a much more scientifically mature, predictively accurate, computational description of the brain.
Maybe! But it strikes me as pretty optimistic. If you put a gun to my head and forced me to bet, I'd bet that in 50 years we'll have a significantly improved understanding of the brain, and neuroscience, and psychology, but it won't exactly look like a scientifically mature theory of *belief*, and it will still be tricky/a matter of interpretation to translate between what we learn and propositional attitude talk. I don't want to go all the way to the Churchlands' position; I think it's a good bet that propositional attitude talk will continue to provide the best "bang for buck" in terms of how we think and talk about ourselves when we can't put each other in brain scanners and subject the results to extensive analysis.
LLMs are a useful analogy, I think. We know the basic algorithms they're running, and there's nothing like "belief" at the most fundamental level. Nevertheless, it can be useful to conceive of them as having beliefs and desires in trying to predict and explain how they respond to prompts. We know it's not exactly right, but there's probably no simple, illuminating, tractable way of explaining whey they responded to such-and-such prompt in such-and-such way that's exactly right. So it may remain a useful heuristic for making sense of them. While I'm sure there are important differences between human cognition and how LLMs work, my guess is that folk psychological descriptions of us are not *much* more joint-carving than they are in the case of LLMs.
I see! My suspicion, for the little it’s worth, is that the folk categories will be replaced by various constructions out of the basic notion(s) of mental representation, where those constructions are more joint-carving than the folk categories. We then have the questions of (i) whether the basic notion(s) of mental representation should be understood hyperintensionally, and (ii) if so, how the representational notion(s) should be modelled. If you agree with Tyler Burge on perception, we seem to require a hyperintensional notion of mental representation to make sense of mainstream vision science. If that is correct, then my bet is that (i) gets an affirmative answer. I have no idea about (ii).
Really nice discussion! I haven't read Williamson's book, but I'm interested in the general issue and also trying to find materials for a section of my grad seminar about simplicity and unification in ethics. Does Williamson ever get into why more explanation in ethics is better? Or is that not really the focus of the book?
Definitely not. I'm not sure ethics comes up at all. I think there are pretty high barriers to entry for the book; if you already know the other debates he's contributed to, it's nice to see how he sees them as hanging together. But he covers a lot of ground without really providing much setup, so if you're not already up to speed, I imagine it would be easy to get lost.
in my experience, the kind of bullet biting mentioned here reliably tends to pair with 'robust' realism and platonism about everything salient to most people, and this entails the positing of separate, hidden realities where 'the real explanations' exist and govern all the ordinary stuff, which we can somehow (?) access. these bb-ers often claim that their principles, values, preferences, stances... are The Objectively Correct ones, and even if they don't know whether they are, they at least 'admit' that such a fixed Proper Order of Reality exists and are virtuously trying to fit their worldviews to The One Correct Order of Reality.
i find this unintelligible; i literally don't understand at all what they mean or why anyone would think that eg. the 'concept of baldness' is a stance-independent, alingual, non-human informational entity existing eternally in a separate realm, or that there are simple, irreducibly normative stance-independent facts about all modalities of aesthetics...
Five years that I study to discover a cure for stuttering. I gained fluency, but no way to prescribe it to fellow stutterers. Qualia you people call it. Do you know what it's like to think about how to convey an organism state for years on end with nothing to show for it? You come up with nothing again and again and again. And you see people removed from the reality of consciousness to clearly see it as is because they get too caught up in their language.
Physics saw a steady decline after going to the moon. Give physics an objective together with constraints and physics gives you the atomic bomb in the span of a few years. Give physics no objective and you get theories of everything where you don't know what you're looking for nor what it might look like nor what it might mean for you. Still, physics delivered enough to fuck around a bit. You got neuroscience at Newton's laws of motion and thermodynamics era with Predictive Coding and FEP. You got consciousness theorizers who want to leap from Newton's era to Witten's era, but laws do not bring you to M-theory. Interactions do.
I exercised control during a panic attack and offered a neurophysiological account of my experience not half an hour later. I was immediately grateful to have experienced one because I now understood it from a first-person view. But I also knew I wouldn't get one again because it cost me no instability so I didn't fear it.
LLMs can't teach navigation skills. Our universe is bigger than theirs. My estimation is that consciousness remains elusive. I hope you prove me wrong, o' bullet-biters argghhh!
I loved this article. Elegant and full of great examples/explanations.
About science and particularity: yesterday I was watching Fire of Love, a documentary about a volcanologist couple that visited and filmed well over 100 live volcanoes. At one point Maurice (who did the press tours) is asked about the science of volcanology. And he reports his frustration with the way that his colleagues classify volcanos, insisting that each volcano has its own “personality” and should be studied in itself.
I think some philosophers have this attitude. They are after some other epistemic good besides “explanation” in the form of subsumption to a simplified pattern. They are more like the volcanologists, in love with each volcano, than theorists, trying to find the deep patterns common (or at least common enough) to them all.
I feel like this is a common feature of scholarship in the humanities - it’s all about the particularities of each thing, with a resistance to any sort of universalizing message.
A quick follow up. On second thought, I wonder if some “volcanological” particularists actually see themselves as doing theory-building, just in a more granular way. Then the overfitting charge undoubtedly sticks.
I do take seriously the fact that overfitting seems not to be a problem for prediction in contemporary machine learning with huge datasets. In some domains, you might think we're just not going to get the kind of understanding that subsumption under simple, general patterns provides: they're too complex. Maybe the best we can do in such domains is train the giant neural networks that are our brains to make good judgements using tons of data, even if we can't then articulate how we make our judgements using general principles. Maybe that's how particularists understand ethics.
Overfitting absolutely is a problem in ML, even with a huge dataset.
If your dataset is big enough, the model cannot overfit on all of it because the complexity required exceeds the size of the model. But it can, and will, overfit on parts of the dataset. That's why LLMs will sometimes regurgitate chunks of their training data verbatim. Moreover, if the dataset itself contains certain biases, those biases will be reproduced in the model.
"Seems not to be a problem" is certainly too strong. Maybe I should've said something like: "you'd think overfitting would be a devastating problem for models with trillions of parameters, and it turns out it's not--performance of trillion-parameter models on novel data is often better than performance of billion-parameter models--as long as you're training on large enough datasets that even trillions of parameters aren't enough to fit them entirely by memorization." Does that sound better?
Here, it’s definitely weird but the memorization threshold is not a crucial issue. There are some proofs you can get effective learning even with many more parameters than data points, as long as the regularization is appropriate. Regularization is what prevents memorization in the high-parameter regimes.
The simplest case of regularization, “Tikhonov” L2 regularization, implicitly “multiplies your data points infinitely by adding white noise to each of them.” (It doesn’t explicitly generate the noisy data, but the algorithm result is formally identical to the result of the process that does generate the infinite data.) The algorithm can’t memorize infinite noisy data, and it doesn’t memorize the data points without noise since it “forgets which ones are the originals.”
Huh, cool!
Yes, I think the nontechnical messaging around the technical issue of “double descent” is pervasively confusing. Some people talk about it as if it means the models stop overfitting, but practitioners are more aware that there can still be application-fatal overfitting. A lot hinges on whether the model was regularized “properly”, which amounts to baking in intuitive heuristics like Occam (for L1) or preferences for model robustness (for dropout)… but relating the intuitions to exact application is tricky and those conversations are hard to make clear.
Just a (non-earth) scientist here — how much of this is due to the fact that volcanoes are pretty idiosyncratic objects of study; there aren’t that many of them, no two volcanoes can occupy exactly the same space (relative to ocean, tectonic plates, etc), and so there really isn’t such a thing as a “standard-issue volcano” and developing a Grand Unified Theory of Volcanoes just isn’t the most feasible endeavor?
Compare of course to something like biology where even within some specific sub-population of interest, there are many carbon copy-esque specimens one can study.
I really liked this article. It looks like you present the outline of a solution to the paradox of inference, which IMO is a problem well worth solving.
Plus, Quine, Dummett and Frege references. Yay. Don't think I have seen any of these on Substack before now.
BTW, just in case you wonder, no, I am not a relation of Timothy Williamson.
I don't think I see the connection to the paradox of inference, though I'd love to hear it.
I will have to get back to you when I have a lot of energy, if that's okay.
At least three of my current papers can be summarized as “Williamson is wrong” (about evidence being a proposition, about how far we are from cognitive homes, and about the relationship between knowing how and knowing that). In each case, I think he gives a nice and elegant theory, but most of them just aren’t that useful, and I think that’s the core of this issue. An infinitely flexible account that lets you accommodate everything doesn’t predict anything - but a highly rigid theory that says there is a precise fact of the matter, but it’s just one that we are incapable of knowing anything useful about, is just as useless.
I love this point. In the course of writing my actual review, I find myself trying to voice the following frustration. He's often willing to dismiss this or that phenomenon as noise, which he'll say we could only accommodate with an overly inelegant, overfitted theory. For instance, the difference in cognitive significance between "Clark Kent is Clark Kent" and "Clark Kent is Superman." By his lights, both sentences mean the same thing, both beliefs have the same content, and any differences in the roles they play in our cognitive lives are irrelevant to formal semantics, formal theories of belief, etc.
I find myself wishing he would be more explicit about what it is he thinks formal semantics, formal theories of belief, etc., are supposed to *do*, if explaining those differences isn't a big part of the story.
>"what it is he thinks formal semantics, formal theories of belief, etc., are supposed to *do*"
I think Williamson's metaphilosophy would reject that question as inappropriate. We're only supposed to ask whether the claims of (e.g.) a formal semantic theory are True or False. The pragmatic question of why we care about the theory in the first place is deemed irrelevant to the question of its truth or falsity.
From TPOP (2nd edition, p. 540): "the motivation for asking [a philosophical question] should not be confused with the question’s content."
Tried to reply already but not seeing it, so apologies if this ends up as a double post.
While I see your point, I don't see how to square it with what he says about model building in philosophy. It's a pretty standard line, which I think he accepts, that models aren't supposed to be literally true, but are supposed to illuminate some important patterns/phenomena, especially the sorts of models you see in formal philosophy (as opposed to giant models in meteorology, or the ones the fed uses, which really are supposed to be as accurate as possible).
Once you think of what you're doing as model building, I don't see how you can avoid letting some amount of pragmatism in; you need to distinguish the underlying phenomena you really care about illuminating from the stuff you're willing to regard as noise.
Yes — if you think of model-building as inescapably pragmatic, then assessing the success of a model requires an account of the model's intended purpose; it becomes difficult to reconcile the claim that "philosophy is about model-building" with the methodological constraint that a truth-conditional semantics for philosophical language should be definable independently of pragmatic concerns.
So Williamson must believe he has a conception of "model-building" that isn't inescapably pragmatic. And it would be nice if he would let us know what it is!
The “norms of assertion” stuff is the biggest example of this to me. What is a “norm of assertion” supposed to be if it’s not the full Gricean thing with four out more different categories of features that are differently relevant for different conversations depending on their purposes?
I don’t accept your analogy between scientific and philosophical theorizing, and I doubt that overfitting is even a problem in philosophy. You suggested that we sometimes should bite bullets to increase the predictive power of our theories (or to avoid merely accommodating pre-theoretical judgments). I accept this in the case of scientific theorizing but not in the case of philosophical theorizing. The difference with philosophy is that many philosophical theories are not empirically testable. Often, the only thing we have to rely on in philosophy is our judgments.
Overfitting is bad in science because it leads to poor outcomes. If we overfit a model to the growth rates of particular plants, then it will poorly predict the growth rates of other plants. Crucially, scientists get feedback from the world. If their models are overfitted, then they do not work well. If their models are fitted well, the models lead to successful empirical predictions.
As I understand Williamson, he’s saying that there’s nothing special about philosophical methodology. Just as scientists rely on their judgments, philosophers must rely on their judgments. Presumably, then, overfitting in philosophy is bad for the same reason that it is bad in science: overfitted philosophical theories yield the wrong predictions. The problem is that philosophical theories’ predictions usually don’t have any practical consequences. Whether epistemicism is true, whether numbers exist, whether there are concrete possible worlds, I could not possibly look at the world to determine these things.
Philosophical predictions often do not consist in empirical predictions. They consist in predictions about what further judgments I should make. Pretty much the only way to “test” a philosophical theory is by comparing it against other philosophical judgments I have. This suggests that philosophers only have their judgments, whereas scientists have judgments and empirical evidence. If I am right, then I don’t see a problem with fitting our philosophical theories to our judgments.
I think this a very reasonable line of question, and I see it as in the same ballpark as the topic Andrew Sepielli mentions he's going to cover in his seminar above. In science you can learn you're overfitting by testing a models performance on new data and seeing that it does poorly. In philosophy you generally don't have opportunities for those sorts of clean tests.
I'm not sure if this is just a temperamental or aesthetic judgment--I hope not--but it still seems to me that it's worth aiming for explanation in philosophy, and that requires seeing overfitting as a bad thing. Here's an example I like. It's a consequence of the standard Bayesian formalism for representing belief and belief change, along with some (contestable!) assumptions about the structure of evidence, that absence of evidence is (perhaps small) evidence of absence: if learning E provides evidence for H, then failing to learn E provides evidence against H.
That's counterintuitive--it's a familiar slogan that absence of evidence is not evidence of absence! By my lights, once you see how it follows from some pretty natural assumptions, the reasonable thing to do is to discard your initial judgment that absence of evidence is not necessarily evidence of absence, and to regard yourself as in possession of an explanation as to why it must be. I admit there's no knock down empirical case for this. You can reject the assumptions about how evidence should be formalized that lead to the result. But that seems to me to leave you in a worse place, from the standpoint of the sort of systematic understanding that I take it philosophers tend to crave. But maybe not all philosophers! Maybe if my sensibilities were more particularist, I'd be less sympathetic to the idea that there are any general formal principles about evidence that could lead to surprising consequences like the one just mentioned.
Extremely clear and interesting post. This may sound out there, but I was reminded of a paper by Erik Hoel on dreams and the overfitting brain hypothesis. Hoel argues that the hallucinatory nature of dreams may serve to rescue the generalizability of experience against overfitting, and he draws on work on deep learning to make his point. This was back in 2021, and I wonder how he would construe current-day LLM hallucinations. If he’s right, it may be that hallucinations vs. overfitting are the tradeoff and models pay that price as a byproduct of their attempts to generalize despite having trillions of parameters. Or something to that effect. And the interesting analogy would be: biting bullets is a bit like hallucinating: it will make sense eventually.
The paper: https://www.sciencedirect.com/science/article/pii/S2666389921000647
So I saw this comment previously and wanted to meaningfully circle back, but…who am I kidding; I’m on here to vegetate and not have more papers to take a serious look at, lol.
That said, the guy is a biologist and almost necessarily predisposed to be interested in the ‘neural correlates’ of ANNs if he’s going to talk about AI. Rich history of tortured analogies there, though I certainly admire the dedication of Hinton et al.
My working quibble is that hallucinations (evocative term notwithstanding) and questions of generalization are, while extremely nontrivial, actually quite mundane. Eg, knowing how to drive in Boston after having learned in SF. Not inventing random streets in one’s mental map of Chicago. Dreams can be very mundane, too (eg, imaginary normal conversations w/coworkers), but the author seems all too eager to dismiss dreams w/”high emotional valence” — their frequency, their neurobiological significance, etc. I guess it’s fine if he wants to limit his analysis to the sorts of boring dreams we have about work, but it’s not like Dadaist nightmares about fire-breathing dragons and the end of the world don’t exist for most people. If he doesn’t want to touch anything Freudian-tinged, he needs to specify that he is working with a very limited subset/definition of “dreams” — the mundane unemotional sort — bc for the purposes of his analogy w/emotionless machines, that’s the only sort that works.
When has one bitten a bullet? What’s the literal content of the pain metaphor? Is it just denying something that seems true? So any solution to a philosophical paradox must bite a bullet? I’m trying to figure out what bullet epistemicists have to bite
Tangentially related to this: ‘Precise’ in ‘The book that made his name in philosophy was a defense of the idea that vague terms like “bald” or “tall” have precise boundaries, though we can’t know exactly where those boundaries are’ isn’t right, since Williamson isn’t denying that the boundaries are vague and ‘precise’ is usually just used as an antonym of ‘vague’. ‘Sharp’ might be better.
But in any case, that (e.g.) there is some specific hair such that one is bald iff one has that many hairs or fewer is as much a consequence of supervaluationism and other classical views as it is of epistemicism.
I tend to disagree with Williamson about what "precise" means in natural language. So while I'd stick with his terminology in a philosophy paper, for the sake of glossing his view for a general audience I don't feel obligated to defer to his usage.
And on the idea that other views have to bite some of the same bullets as epistemicism, I absolutely agree! That's part of what I mean by saying he makes a powerful case in the book that it's very hard to give a theory of vague language that avoids commitment to comparably bizarre sounding claims--sometimes the very same ones, sometimes others--to the ones the epistemicist asserts.
‘In some cases, it is either true or false that NN is bald, but “NN is bald” is neither true nor false.’
I think a tension in views which are physicalist-naturalist and bullet biting is that it seems very hard to fit the foundational principles that the physicalist-naturalist use in their bullet biting into a physicalist-naturalist worldview.
I think this tension comes from the fact that the natural sciences, which the physicalist-naturalist argues should be the (sole?) guide to the fundamental state of the world, rely centrally on methods which can't obviously be fit into at least the physicalist worldview, like Bayesian probabilities, causation and mathematics.
There's a second problem for the naturalist that it's unclear how natural sciences can discover the correct methods for doing natural science. I'm much more sympathetic to naturalist bullet biting here, though, than I am of the physicalist position.
Everyone will have to bite a pretty unpleasant bullet when building the foundations of how to acquire knowledge, and circularity isn't obviously much worse than any of the others. It seems like the physicalist, though, doesn't have to be committed to a physicalist ontology, and so commitment to this view looks much more like scientism to me than naturalism does (synthetic a priori knowledge is also a pretty unpleasant bullet to bite!)
I think the naturalist view does fall apart, though, if one wants to use unjustified axioms to ultimately ground one's view. I think in the worst case this is just straightforwardly inconsistent while also being a quite common view amongst lots of people with exposure to the natural sciences (I suspect that lots of LessWrong style rationalists implicitly hold views like this.) Part of the motivation for writing this is also reflecting on my own views. Before getting exposure to the philosophy of science I certainly held foundationalist views about epistemology, and I was certainly sympathetic to the argument that we should strongly defer to findings of natural sciences (and I still think that often we should!)
Hello! Just some loose thoughts — forgive me if insufficiently informed — about what you write here.
- My understanding is that the explanatory and predictive power of highly idealised models in economics is a matter of considerable controversy. Maybe they therefore don’t inform the best case for bullet biting. (Obviously, these models are also controversial in philosophy.)
- With respect to the toy example, suppose we want to predict the betting behaviour of people with cognitive problems, who may not remember which numbers are prime, and who might struggle to apply the definition to cases. Then perhaps logical omniscience is an inappropriate idealisation. You would need to fill in the gaps, to be sure, but would that necessarily amount of overfitting? Or consider explanations of famous reasoning mistakes, like Wason’s selection task. Whether or not we can explain these adequately with a possible worlds model of belief, it would surely be unfair to lump together alternative models with tortured polynomials.
- Sometimes in epistemology it’s helpful to idealise because you want to abstract away from memory and other limitations. That strikes me as especially appropriate because epistemology is, I think, normative: it’s about good reasoning and things like that. If we want to study how knowledge is transferred by testimony, for instance, it makes sense to consider ideal agents, so that knowable transfer is not affected by irrelevant factors. I know that we can sometimes learn about actual reasoning by seeing it as an approximation of normatively ideal reasoning. I am also aware that idealisations can be appropriate even when not modelling normative ideals, but rather modelling actual phenomena directly, as it were. Nonetheless, it seems to me that there is a large space of good science that lies between accepting highly idealised models of belief, and a dithering, unproductive particularism.
- I like an analogy from computer science. You might want to predict how quickly your computer will run a programme as a function of input size. Up to a point you can get by with a fairly abstract RAM model, and there’s nothing wrong with that for some explanatory purposes. (Why does this sorting algo take much longer than this one?) But if you want a highly accurate prediction, or more detailed explanation, you will need to particularise your model, taking into account all the fancy workings under the hood. Even very particular models of specific architectures are powerful and explanatory, as any good systems textbook can attest to. (Of course they too idealise away from all the physics.)
- I guess I sometimes worry that this overfitting rhetoric could degenerate into just that: rhetoric. It would be good to see specific and uncontroversial examples being discussed in detail. (Oh no detail!) I find the metaphor of overfitting a curve to data to be quite unhelpful when adjudicating between, say, different models of computation, or different models of belief or reasoning.
- It seems to me that different extents of idealisation are appropriate for different purposes. Unless one has a reason for systematically privileging the most idealised models as guides to the metaphysics, accepting highly idealised models does not need to come with any bullet biting. (Is my thinking.)
Not sure how interesting or controversial any of that is! But thanks for your post; this stuff has been on my mind lately.
I like the example of the Wason task. I agree it's a good phenomenon to take as an explanandum. My sense is that no extant non-logically-omniscient theories of belief do a good job with it. Of course, you can take an impossible worlds theory of belief and add in by hand all the stuff you'd need--make sure the agent knows that it follows from "you need to be 21 years old to drink" that you need to check both the drinkers, and the <21 year olds to ensure compliance with the rule, but let the agent fail to know that it follows from "odd-numbered cards have a vowel on their obverse side" that you need to check both the odd-numbered cards, and the cards with consonants. But it doesn't fall out of the theory; you could just as easily have predicted the opposite result.
Here's a different example. Take something like prospect theory. It's explicitly meant to be a kind of de-idealized theory of decision-making that can make sense of some phenomena that traditional expected utility theory can't. But it's still constrained enough that it can't make sense of just *any* behavior. I'm (at least tentatively) a fan! My sense--happy to be corrected--is that it's still about the best we've got for making sense of various behavioral phenomena (e.g., loss aversion). I don't know of any comparably illuminating theories of logically non-omniscient belief. That is, theories where there's some robust empirical phenomenon (like loss aversion) where it falls out as a reasonably natural, non-ad-hoc prediction of the theory, rather than something that you can get if you set the parameters just right, but where you could have just as easily and naturally have gotten the opposite prediction if you'd set the parameters differently. I'm of the view that there's some principled reason why there couldn't be such theories, nor am I opposed to the search for them. But I do think conceiving of the constraints in the sorts of terms I've been discussing--ie, you should want a theory where the kinds of logical mistakes we make come out as something predicted by the theory, rather than merely something that can be described in the language of the theory--is fruitful.
Thanks for your response! I think that your framing is very illuminating. There does seem to be a major difference between expecting logical mistakes to fall out of the theory, vs being merely expressible in it.
I still wonder whether we really have to choose here. Again I think that models of computation offer an illuminating analogy. A general model of computation might begin with Turing Machines or RAMs are whatever else. Likewise, a general model of belief might begin with possible worlds. These can offer insights about concrete computers and believers, respectively. But at least in the case of computers, there comes a point where it is unreasonable to expect such models to be of much use. They are too idealised to explain (e.g.) why the same C code is slower on this machine than on that one, to the specific extent that it tends to be. At some point, good scientific practice models the finer grain of computer architectures and algorithms. Likewise, I would expect that a mature science of belief will eventually incorporate rich specifications: that much of why and how we reason is best explained in a model that, yes, could have been tweaked any which way. I don’t think this is bad science. As in the case of computers, scientific theorising can head in that direction when prediction and explanation become hungrier for precision and specificity.
The example you give of prospect theory is a helpful tip! I don’t have the expertise to assess the merits of that programme. I see no reason to depart from your tentative endorsement of it. But I am sceptical that it’s the whole story. I think I stand on reasonably firm ground when I say that, contra for instance vision science, the science of belief remains a relatively immature branch of psychology. Actually vision science is an independently interesting case, because logical omniscience is seemingly not appropriate there, at least not when we expect our model to be metaphysically telling. Many truths are not perceptually representable at all, so we cannot say that perception is closed under logical consequence, or whatever notion of consequence attaches to perception.
I am not a big fan of impossible worlds conceptions of propositional attitudes. I am drawn to neo-Fregean proposals. I have no idea how much these accounts can recover of the explanations that you prefer, nor how easily.
I reckon I need to read your book!
I like the case of computing, but I think it's instructive in part because it makes it easier to describe some important *differences* between folk psychological descriptions of humans on the one hand, and program or algorithm-level descriptions of computers on the other. Basically, I think program-level descriptions of what computers are doing--even though they involve abstracting away from lower level implementation detail--still "carve at the joints" much much more closely than do folk psychological explanations of human thought and behavior. Computers are intelligently designed by humans so that they perfectly implement abstract, program-level descriptions. Humans were designed by evolution, and evolution wasn't trying to make sure that the operations of the brain could be really neatly characterized in terms of propositional attitudes. My read of the LOT tradition in psychology/phil mind is that it takes the opposite view; as we get better at psychology, we're going to find that categories pretty close to our pre-theoretical common sense categories for categorizing the mind (categories like "belief") also end up as fruitful categories to use in giving a much more scientifically mature, predictively accurate, computational description of the brain.
Maybe! But it strikes me as pretty optimistic. If you put a gun to my head and forced me to bet, I'd bet that in 50 years we'll have a significantly improved understanding of the brain, and neuroscience, and psychology, but it won't exactly look like a scientifically mature theory of *belief*, and it will still be tricky/a matter of interpretation to translate between what we learn and propositional attitude talk. I don't want to go all the way to the Churchlands' position; I think it's a good bet that propositional attitude talk will continue to provide the best "bang for buck" in terms of how we think and talk about ourselves when we can't put each other in brain scanners and subject the results to extensive analysis.
LLMs are a useful analogy, I think. We know the basic algorithms they're running, and there's nothing like "belief" at the most fundamental level. Nevertheless, it can be useful to conceive of them as having beliefs and desires in trying to predict and explain how they respond to prompts. We know it's not exactly right, but there's probably no simple, illuminating, tractable way of explaining whey they responded to such-and-such prompt in such-and-such way that's exactly right. So it may remain a useful heuristic for making sense of them. While I'm sure there are important differences between human cognition and how LLMs work, my guess is that folk psychological descriptions of us are not *much* more joint-carving than they are in the case of LLMs.
I see! My suspicion, for the little it’s worth, is that the folk categories will be replaced by various constructions out of the basic notion(s) of mental representation, where those constructions are more joint-carving than the folk categories. We then have the questions of (i) whether the basic notion(s) of mental representation should be understood hyperintensionally, and (ii) if so, how the representational notion(s) should be modelled. If you agree with Tyler Burge on perception, we seem to require a hyperintensional notion of mental representation to make sense of mainstream vision science. If that is correct, then my bet is that (i) gets an affirmative answer. I have no idea about (ii).
The irony is not lost on me but I can’t help but think the distinction between fitting and overfitting is not so cut and dry.
Really nice discussion! I haven't read Williamson's book, but I'm interested in the general issue and also trying to find materials for a section of my grad seminar about simplicity and unification in ethics. Does Williamson ever get into why more explanation in ethics is better? Or is that not really the focus of the book?
Definitely not. I'm not sure ethics comes up at all. I think there are pretty high barriers to entry for the book; if you already know the other debates he's contributed to, it's nice to see how he sees them as hanging together. But he covers a lot of ground without really providing much setup, so if you're not already up to speed, I imagine it would be easy to get lost.
The closing sentence is *chef's kiss*.
in my experience, the kind of bullet biting mentioned here reliably tends to pair with 'robust' realism and platonism about everything salient to most people, and this entails the positing of separate, hidden realities where 'the real explanations' exist and govern all the ordinary stuff, which we can somehow (?) access. these bb-ers often claim that their principles, values, preferences, stances... are The Objectively Correct ones, and even if they don't know whether they are, they at least 'admit' that such a fixed Proper Order of Reality exists and are virtuously trying to fit their worldviews to The One Correct Order of Reality.
i find this unintelligible; i literally don't understand at all what they mean or why anyone would think that eg. the 'concept of baldness' is a stance-independent, alingual, non-human informational entity existing eternally in a separate realm, or that there are simple, irreducibly normative stance-independent facts about all modalities of aesthetics...
Five years that I study to discover a cure for stuttering. I gained fluency, but no way to prescribe it to fellow stutterers. Qualia you people call it. Do you know what it's like to think about how to convey an organism state for years on end with nothing to show for it? You come up with nothing again and again and again. And you see people removed from the reality of consciousness to clearly see it as is because they get too caught up in their language.
Physics saw a steady decline after going to the moon. Give physics an objective together with constraints and physics gives you the atomic bomb in the span of a few years. Give physics no objective and you get theories of everything where you don't know what you're looking for nor what it might look like nor what it might mean for you. Still, physics delivered enough to fuck around a bit. You got neuroscience at Newton's laws of motion and thermodynamics era with Predictive Coding and FEP. You got consciousness theorizers who want to leap from Newton's era to Witten's era, but laws do not bring you to M-theory. Interactions do.
I exercised control during a panic attack and offered a neurophysiological account of my experience not half an hour later. I was immediately grateful to have experienced one because I now understood it from a first-person view. But I also knew I wouldn't get one again because it cost me no instability so I didn't fear it.
LLMs can't teach navigation skills. Our universe is bigger than theirs. My estimation is that consciousness remains elusive. I hope you prove me wrong, o' bullet-biters argghhh!