Forgive me for the following hand-wavey explanation, I sort of understand statistical regression in the abstract, but I couldn’t do one myself.
Regressions are a mathematical tools that help match like to like. They’re the method by which the authors are controlling for things like criminal history and severity of the crime. The basic idea is that there’s some function that produces the measured outcome, in this case the length of a prison sentence. The function is something like,
(number if prior offenses)*x+(severity of offense)*y+(type of crime)*z+(some remaining unexplained factor)=(length of sentence)
That remaining unexplained factor is the error in the function. If we know the other things, we can guess the length of the sentence plus-or-minus that remaining bit. We can add other factors to try to reduce that remaining unexplained factor. Maybe length of sentence is affected by where in the country the case took place, or how good a lawyer the defendant had, or when the judge last ate. Adding those things to the function would reduce the error, i.e. if we know them for a case we can guess the prison sentence better. Other things we could add probably wouldn’t: defendant’s blood type, closing price of the S&P that day, whether the Red Sox won their last game.
Regressions are how we find this function. We take a bunch of cases and pull out all the information and see how much things contribute to removing the unexplained part, i.e. if we know some piece of information about a case, how well can we guess the outcome. Here, we’re trying to see how much of the difference is explained by race of the defendant.
We could do as you suggest, and limit the study to “first time offenders with similar backgrounds who have clean records”. We could do a thousand separate studies: first time offenders with background x, second time offenders with background x, first time offenders with background y, second time offenders with background y, etc. But regressions let us look at all those cases at the same time. Assuming that race is a factor that operates similarly in all those cases, we can get an idea of the general impact of race.
It should be acknowledged that this method isn’t perfect. We could be missing variables and it’s always possible that even where race helps reduce the error, it’s doing so by acting as a proxy for something else that we haven’t included. For example, if this study didn’t control for the geographical area (it does), race might be a acting a proxy for geographic area, e.g. if most black defendants come from certain areas, and those areas also have tougher sentencing in general. But it’s still a reliable method, and when we have hypotheses about other things that could be resulting in what looks like racial bias, we can plug them into the regression and see if they remove the effect of race. In this case, we controlled for geographic area, and we still see race as playing a role in determining sentence length.
Again, we’re trying to compare like to like. Where we know that the criminal justice system produces vastly different outcomes for men and women, we need to control for that variable, and compare male back defendants to male white defendants, and female black defendants to female white defendants. You’re assuming that “they had no case to unfold regarding racial disparities in treatments of women”, but that’s unfounded, the study doesn’t say that, it says it focused on men because the two populations are different men are 80% of prisoners.
As for why defendant sex wasn’t just plugged into the regression, it could be that sex is such a major factor in sentencing that it’s effectively not the same process. It could also be that the number of cases for women for which the relevant data was available was just insufficient to include in any meaningful way (for example, data from some states was excluded). It could be that intersectionality matters, and race really does do something different from women than it does for men. I don’t think we can assume that, and in any case it isn’t necessary to speculate in order to interpret this study: this study provides strong evidence that black men are given longer prison sentences because they are black.
They aren’t. After noting that a specific data set only indicates severity directly as a distinction between misdemeanors and felonies, the report notes that “charges are simply recorded as the detailed section of the criminal code a defendant is charged with violating”, i.e. the law that the defendant broke. The authors use those code sections to assess severity. This doesn’t resolve all ambiguity, but it gives a much finer-grained severity assessment than the misdemeanor/felony distinction.
As you note, this requires making “realistic assumptions” about how those code sections are applied. They are assuming, for example, that most convictions under a specific code section is not being sentenced based on some obscure aggravating factor mentioned in the code section. The assumptions are realistic in that they reflect how the code section is most likely to be applied.
I get the way you are upset with assumptions, but you can’t not make assumptions (I made this same mistake in my post above, but I caught it a few days ago!), and they state what their assumptions are. The right criticism can’t be “bullshit, they’re making assumptions!”, it needs to be “this specific assumption is unreasonable for these reasons, and if it’s false it undermines the findings in these ways.” There’s just no way to “have all the blanks filled in by factual data”, there are always blanks in any causal story.