Welcome to our substack! I’m Dan Greco, a philosophy professor who specializes in epistemology: the branch of philosophy concerned with topics like knowledge, evidence, and rational belief. I've known Matt since back in our days of college debate, when we enjoyed arguing about philosophy, law, economics, and much more. I plan to use this substack as a venue for informal writing, some of which will be on themes I’ve pursued in academic work, and some of which won't. Today I'll start with a post on what role statistical evidence should play in the law, a topic squarely in the intersection of my and Matt's fields. In the future I intend to stray further from our core competencies.
Ever since a famous paper by Larry Tribe, legal theorists and philosophers have worried about what to say about cases like the following:
Blue Bus: Someone is hit by a bus. No identifying information about the bus is observed—nobody sees the license plate, tire tracks, a driver, or anything like that. However, it’s known that 80% of the buses that operate on the street where the victim was hit are owned by Blue Bus Company. This is the only relevant information on which we can judge how likely it is that Blue Bus Company is responsible for the injury.
The standard of proof in civil cases in the US is “preponderance of evidence”. This is usually characterized as meaning greater than 50% probability—if someone can show that their claim is more likely to be true than false, then they’ve met the preponderance of evidence standard. And if that's what preponderance of evidence means, then the case that Blue Bus Company should be held liable for the victim’s injury looks like a slam dunk; given the paucity of evidence, it’s 80% likely that they’re responsible, and 80 is greater than 50. But most people who hear the case think there’s something wrong—unjust, unfair—with finding liability on the basis of this sort of “bare statistical evidence”. So there has arisen a kind of cottage industry of trying to explain how we should understand the civil burden of proof, so as to be able to say that it’s not met in a case like Blue Bus.
It's not just civil cases that present this sort of challenge. In the US, the standard of proof in criminal cases is “beyond a reasonable doubt.” Suppose that can be understood probabilistically—“beyond a reasonable doubt” means >n% likely, for some n close to 100. (95? 99?) Here’s an abridged version of an example presented by Charles Nesson that raises the same basic issues as Blue Bus:
Prison Yard: 100 prisoners are exercising in the yard. 99 of them get together to attack the guard. One of them stays in the corner and doesn’t participate. This is all observed by another guard on a faraway tower, who’s unable to specify anything about the one prisoner who didn’t participate in the attack. No other evidence exists to help us distinguish the 1 prisoner who didn’t participate in the attack from the 99 who did.
Here, it’s plausible that for each prisoner, given our evidence, there’s a 99% chance they participated in the attack. So can we find all 100 of them guilty beyond a reasonable doubt? If not, would things change if we upped the number to 1000 prisoners, of whom only one didn’t participate? (Now, for each of them, the chance they’re responsible is 99.9%) 10,000? If you want to keep answering “no”, no matter how big the numbers get, then you’re probably committed to the idea that we need some way of understanding “beyond a reasonable doubt” that’s more fancy than the simple picture where it just means "above some probabilistic threshold."
It might seem a bit odd to worry about cases like the ones above. They're can seem artificial and stylized; creatures of philosophers’ thought experiments rather than pressing real-life questions. As it turns out, fact patterns that look a lot like Blue Bus aren’t all that unrealistic. Here’s a typical case. Somebody could prove they got mesiothelioma—which is typically caused by asbestos exposure—and that as part of his job for many years he used an arc grinder on brake drums, which often contain asbestos. He could prove the brake drums were manufactured by a specific company. While he couldn't prove that the specific brake drums he worked with contained asbestos—these cases are always brought many years after the fact, as mesothelioma takes a long time to develop—he could prove that the overwhelming majority of the brake drums manufactured by that company at the time did contain asbestos. Should he have been able to recover damages? (In fact, the he was able to recover, as were many others in similar situations.)1
Moreover, these cases can be seen as a special case of the more general question of whether statistical evidence ought to be treated with suspicion, as somehow different from other sorts of evidence. While cases where the main evidence at issue is statistical present a particularly pure version of this challenge, it comes up in other ones as well. In one case, a pair of brothers were charged with smuggling opium in the Minneapolis area. Should the prosecution have been able to mention—not as their only evidence, but as one argument among many—that the brothers were of Hmong descent, and that 95% of the opium smuggling cases in the area involved people of Hmong decent? (They weren't allowed to mention it.)
If evidence for a claim is just any collection of facts that raises the probability of the claim, then it's hard to deny that these facts are evidence—far from dispositive, but still relevant—in favor of their guilt. This is easiest to see if you imagine the impact of learning that they're not of Hmong descent. Even if they were at the scene of the crime, and there's some physical evidence linking them to it, if 95% of the opium smuggling cases involve Hmong people and these are a couple of Kenyans, that's some reason to think that they were just in the wrong place at the wrong time. But it's a general principle of probability that if learning E (they're not Hmong) would raise the probability of H (they're innocent), then learning not-E (they are Hmong) must lower the probability of H (ie, must make it more likely that they're guilty).
As it turns out, I like the simple picture that evidence in the law should be understood in terms of probabilistic relevance, and burdens of proof should be understood in terms of probabilistic thresholds.2 So I think we should get out of the business of trying to concoct explanations for what’s wrong, unjust, or unfair with findings of liability—civil or criminal—based in whole or in part on statistical evidence. I recognize that may sound like a big lift. For now, I want to discuss a more abstract question of methodology. The reason is that I think the surprising claims I just made can be supported by independently attractive approaches to thinking about what the law should be.
Casuistry vs. Policy Engineering
Here's one common way of thinking about questions like the ones we've discussed so far. We could neutrally call it the method of cases, or slightly more pejoratively (but still accurately!) the method of casuistry. A casuist considers starts with intuitive judgments about how various cases—hypothetical or actual—ought to be resolved, and treats these as data. They then try to construct abstract legal principles that best explain these data. They may be willing to reject data—to revise their initial judgment about how some case ought to be resolved, if that's the only way to hold on to an otherwise attractive general legal principle—but there's a strong presumption against doing so. This is the method most writers in the literature on statistical evidence implicitly adopt.
Contrast this with a different method, whose motivating analogy I heard from my co-author on this substack, Matt Wansley. It goes as follows:
Engineering : Natural Science :: Law : Social Science
Engineers take general principles and theories from natural science, and then apply them to figure out how best to fulfill various goals, e.g., building sturdy bridges. Similarly, on this picture, lawmakers should take insights from social science and apply them in the service of fulfilling various goals. What goals? Lots of them—good laws provide people with physical security (e.g., by deterring and incapacitating those would harm them), make possible various forms of cooperation and coordination (e.g., through contracts), and much more. Let’s call this approach policy engineering.
Before getting into how they help us think about statistical evidence, I want to say a bit about what each method has going for it.
In defense of policy engineering, it’s a way of capturing the extremely natural idea that institutions like laws are merely instrumentally valuable. (This was a theme in Matt's post on the capitalist alignment problem.) Why is it good to live under the rule of law? Not because being subject to laws is intrinsically good. Rather, because there are various other goods—things like life, liberty, and the pursuit of happiness—that we need the rule of law to secure. As David Hume memorably argued, the point of institutions like property rights is that, absent them, nobody would be willing to engage in long term projects like agriculture; if the reward for tilling, planting, and harvesting is that bandits steal your crops, why do all that work in the first place? But of course, a society where nobody is willing to invest in the future would be immeasurably poorer in all sorts of respects. So we shouldn’t think of property rights as good in themselves; rather, we should think of them as a necessary piece of social technology for enabling people to engage in valuable long-term projects.
What does this have to do with questions about standards of proof and statistical evidence? It suggests that the right way to approach them is to ask what values we want the institutions of criminal and civil trials to achieve and to then use our best social science to figure out which ways of structuring those institutions would achieve the best tradeoffs of the relevant values. It’s not obvious why that process needs to involve anything like the casuistic method of consulting our intuitions about what it would take to do justice in individual cases (hypothetical or actual). To return to the analogy to engineering, abstract intuitions about what sorts of bridge designs would work shouldn't be given much weight in civil engineering--better to reason scientifically from what we know about the properties of various materials to conclusions about what sorts of bridges should be built. If we come up with building designs that don't intuitively seem like they should work, so what? Scientific physics trumps folk physics.
But taking this analogy and running with it is a recipe for unconstrained hubris. If you’ve read your Scott, you’ll probably suspect that a policy engineer trying to reason from first principles about the best way to structure this or that institution—some version of which already exists, and has evolved over time in response to real-world pressures—and given free rein to remake the institution according to his ideal, will tend to make a hash of things. Preventing overly ambitious policy engineering is, I suggest, the proper function of casuistic methods in the law.
Here's a picture of how judges in the common law tradition (ideally) behave. They are trying to do policy engineering, but with a large dose of epistemic humility. As policy engineers, they try to construe the law in such a way that it has good results; so that the law enables cooperation, prevents harm, and the like. But because they're epistemically humble, they give a lot of weight to the fact that similarly situated judges over many generations have found some rule to be workable. If there's a longstanding precedent a judge has the option to overturn, he'll think hard before doing so. (Recall Chesterton's fence.) So rather than treating each case as an occasion to do policy engineering from scratch, they instead assume that the body of case law they encounter—in both its general principles, and in how those principles have been applied to particular cases, is reasonable by default. As a result, they prefer to limit their attempts at policy engineering to cases where precedent doesn't speak, or where there are conflicting precedents, and so overturning some precedent is inevitable. This respect for precedent ends up looking a lot like casuistry, in that it treats lawmaking as largely an exercise in extending the law in a way that treats prior decisions as (relatively) fixed points, rather than unconstrained armchair reasoning about how to achieve socially desirable results.
This picture about the function and value of casuistic methods does suggest some differences from how casuistic reasoning about the law is applied in much academic literature, however. Specifically, it justifies caring a lot more about how laws fit with actual precedents than about how they fit with intuitive judgments about hypothetical cases. Why? Because actual precedents have had the opportunity to be tested. If the holding in an actual case leads to absurd results, sooner or later those results will be brought to light in some later case, and the precedent will likely be overturned. If a precedent is longstanding, that suggests it hasn't led to such absurd results. By contrast, judgments about hypothetical cases carry no such guarantee. Some of the principles floated in the literature on statistical evidence, if adopted by judges and made into law, would likely have radically revisionary consequences for the use of DNA evidence, effectively barring it from playing a significant role in criminal prosecutions. Here, while our abstract intuitions about justice may push us in a certain direction, if those intuitions haven't yet been put into law and tested, we have no guarantee that they can integrate with a well-functioning legal system that achieves the sorts of aims policy engineers should want. Returning to the analogy to engineering, we already saw that abstract intuitions about sound design should get little weight. But if some design has actually been used to build bridges, and those bridges have stood for a long time without falling down, then we have much more reason to trust the design. In fact, engineering often precedes science; the Wright brothers achieved powered flight well before anyone had the necessary theoretical understanding of aerodynamics to comprehend how they managed it.
Where does this leave us? Both the policy engineering and casuistic perspectives on lawmaking get something right. The policy engineer is right that institutions like trials and standards of proof are best understood as social technologies whose function is to enable us to enjoy various goods. And the casuist is right that judges should generally not attempt to reason from first principles about which designs for such institutions would best secure those goods, but should instead have a healthy respect for precedent.
Back to Statistical Evidence
What does any of this have to do with statistical evidence? In a nutshell, I'll argue that a naive policy engineering approach provides no reason to see statistical evidence as in any way suspect. And this verdict doesn't change when we move from naive to sophisticated policy engineering by adding in respect for precedent, because precedents concerning statistical evidence don't speak with anything like a consistent voice.
Why does a naive policy engineering approach to thinking about standards of proof and admissible evidence favor welcoming statistical evidence? Because just about any aims you'd want the legal system to achieve are furthered by making it more accurate—more likely to find people guilty/liable when they really did the crime/tort, and less likely when they didn't—and considering statistical evidence makes the legal system more accurate. It's standard to distinguish four purposes of criminal punishment: rehabilitation, retribution, incapacitation, and deterrence. Set aside deterrence for now—we’ll come back to it shortly. Each of those other aims is clearly better served by a more accurate legal system. Innocent people don't need rehabilitation, retribution, or incapacitation, and guilty people do. Civil law, in addition to aiming at deterrence, aims at restitution. And people who don’t cause damages shouldn’t be paying restitution, while people who do, should—accuracy is clearly extremely important.
Deterrence is a bit more complicated. Some writers have argued that considering statistical evidence does not serve the aim of deterrence, in either the criminal or the civil context. The basic worry is that potential wrongdoers generally don't have any control over what statistical evidence suggests about their likelihood of having committed a crime. Let's go back to the blue bus case. Suppose Blue bus company operates 80% of the buses in town, and Grey bus company operates 20% of the buses. If Grey bus company is considering whether to take some action that makes accidents more likely, they'll know that whatever they do, the statistical evidence will still tell against their being responsible for accidents. So a legal system that considers statistical evidence might seem to frustrate the aim of deterring Grey bus company from causing accidents. After all, if they cause an accident—and if there's only statistical evidence to go on—it will be their competitor who ends up paying the price. One can imagine situations where knowledge that statistical evidence will be used creates perverse incentives.
While there's much to say here, I'll boldly claim that these problems largely disappear if we make the reasonable assumption that potential wrongdoers are almost never highly confident, ex ante, that only statistical evidence will be available if they commit a crime. The way to see this is to realize that the same objections we can raise to considering statistical evidence can also be raised about other forms of evidence that we’d never think of excluding from trials, such as evidence about motive or opportunity. If I have a rich uncle who's made me his sole heir, I have a motive to kill him. Even if someone else kills him, I'll still be a potential suspect, due to my motive; in that sense, I can't control what the motive evidence says about me, so we might worry that considering motives doesn’t help with deterrence. If the motive evidence will tell against me whether I kill him or not, I may as well go ahead and kill him, right? This argument is clearly unpersuasive, but it’s really not any better than the argument against considering statistical evidence. Allowing the legal system to consider evidence about motives is clearly a good idea, and doesn’t undermine the aim of deterrence. People with strong motives to kill are exactly the sorts of people we should be extra concerned with deterring—they're the ones more likely to kill—so knowing that their motives will mean they'll face extra scrutiny creates stronger reasons for them to avoid creating any further incriminating evidence against themselves, of the sort that they'd create by actually murdering their potential benefactors. To the extent that this means people without motives are deterred less, it's a reasonable tradeoff; it's more important to deter the people who are more likely, ex ante, to commit the crime in the first place. The same reasoning applies when we're considering people who are more likely to commit crimes because of statistical evidence against them, rather than motive evidence.3
More generally, from the policy engineering perspective, it’s hard to see how statistical evidence is importantly different from other kinds of evidence. Yes, if we’re willing to hold people liable—whether civilly or criminally—on the basis of statistical evidence, we’ll occasionally make mistakes. In Prison Yard, the mistake is very salient—one of those 100 people would be wrongly convicted if we decided that 99% probability was enough to meet the threshold for conviction. But the same basic problem arises for eyewitness evidence, motive and opportunity evidence, and any other fallible means of determining whether somebody really did the crimes or torts they’re accused of. Considerations about the badness of wrongful convictions support setting the threshold for beyond reasonable doubt high; it’s much harder to see how they could support restricting what kinds of evidence can be brought to bear in meeting that threshold.
That's my case that if you take the flatfooted policy engineering approach to law, you'll be happy to treat statistical evidence like any other kind of evidence. How do things change if we adopt the method of casuistry? I admit that the cases I started with—Blue Bus, and Prison Yard—seem like cases where finding people liable on the basis of statistical evidence is unfair. Likewise, it seems unfair that just because other Hmong people are disproportionately involved in opium smuggling in Minneapolis, an individual Hmong person in Minneapolis should come under extra scrutiny.
If there were a long and rich tradition of treating statistical evidence with suspicion in the law, then accepting the arguments I've been offering would amount to a radical revision, and a cautious casuist would be loathe to do so. But there is, in fact, no such clear tradition; as the seemingly contradictory cases I’ve already mentioned in this post suggest, it’s very hard to extract general principles from precedent here.
Here’s a perhaps cynical hypothesis about what’s going on. In cases where limiting the use of statistical evidence would seriously frustrate the aims of the law—e.g., where it would make it impossible for lots of people injured by asbestos to recover damages—courts have tended to allow the use of statistical evidence, as good policy engineers would. But in cases where accepting limits on statistical evidence seems much less likely to have far-reaching ramifications, courts have been happy to indulge the intuition that doing so is unfair. E.g., in the case of the alleged Minneapolis Hmong opium smugglers, it’s highly unlikely that prosecutors would have been able to clear the reasonable doubt threshold if, but only if, they were allowed to offer statistical evidence about the involvement of Hmong people in the opium trade. Even while the majority of opium smugglers were of Hmong descent, the vast majority of people of Hmong descent were not opium smugglers, so in any individual case, lots of non-statistical evidence would have been required for conviction, and it’s unlikely that the statistical evidence would have often been the difference-maker. But I suspect that if cases like Prison Yard somehow became extremely common—if there was a large group of guilty people that could plausibly only be convicted by taking statistical evidence seriously, with error rates comparable to what we accept with other sorts of evidence (eyewitnesses are notoriously fallible!)—then we’d eventually reconcile ourselves to letting statistical evidence suffice for finding criminal liability.
Perhaps I’m wrong about this. But even if so, I hope the perspective urged in this post may change some minds about what kinds of arguments would have to be made to justify treating statistical evidence as distinctively deficient. It’s not enough to appeal to the intuition that the use of statistical evidence (whether on its own, or as part of a cumulative case) is unfair. Rather, a statistical evidence skeptic should identify some goals we hope institutions like trials will achieve, and show how they’re frustrated by treating statistical evidence just like other kinds of evidence. I haven’t yet seen any persuasive arguments to that effect, and my suspicion is that there are none to be found.
You might be wondering whether this case involved market share liability. It didn’t, though the reasoning in the opinion was in some ways similar. More generally, I think the existence of market share liability as a doctrine fits in well with the approach I call “policy engineering” later in this post; it’s not a particularly intuitive doctrine, and it doesn’t follow naturally from earlier precedents. Rather, the doctrine is best understood as a tool to reach for in situations where insisting on proof in the traditional sense would leave large classes of injured parties unable to achieve restitution from the parties that caused the injuries.
And in fact, that’s the main driver of my interest in this topic. It’s not because I think we’re currently doing great harm by not allowing statistical evidence in cases where we should—as far as I can tell, in cases where it actually would matter, courts have already generally found ways to incorporate it. Rather, it’s because I think it would be immensely clarifying to understand all sorts of aspects of the law in probabilistic terms, and bad ideas about statistical evidence are one of the main intellectual obstacles to doing so.
Is motive evidence a form of statistical evidence? E.g., crimes are disproportionately committed by people with motives to commit them, and the accused belongs to the group of people with motives to commit crimes? Could similar arguments be used to portray any species of evidence as a subspecies of statistical evidence? Perhaps. If you're like me, you think any relevant evidence should be fair game for fact finders to consider, so you don't need to worry too much about taxonomizing species of evidence. But if you want a special ban on statistical evidence, then you need to concern yourself with tricky questions of definition.
I love that you're exploring what kinds of probabilistic rules law does, can, and should use, because I think the topic is important and neglected. Most of your analysis is thoughtful and compelling. My one objection is that you seem to have left out the most important reason for trying to make laws mostly return results that line up with our pre-existing moral intuitions: we as a society rely on a sense that the law is "just" or "fair" in order to enforce the law. You can get away with occasional minor deviations, but if the law diverges too widely or too often from what people think is fair, then they won't be adequately motivated to follow the law. There's a little bit of intellectual motivation to follow the law out of an explicit belief that law-following will result in better outcomes for society, but mostly people follow the law out of an intuitive or emotional sense that the law embodies justice. When people stop feeling that the law is a good approximation for justice, they stop supporting law enforcement.
Interesting post. For street stops, the standard established in Terry v. Ohio is reasonable *articulable* suspicion, which rules out purely statistical evidence, no matter how overwhelming. My reading of this is that we do not want innocents to be punished simply because they belong to a group whose members are often guilty. I think this principle is admirable.
Further discussion of the Terry decision and related issues here:
https://onlinelibrary.wiley.com/doi/full/10.1002/pam.22527