Scientific Judges May Not be Racist, But They Are Probably Biased.
Can Blinding Them To The Identify of Other Scientists Reduce Racial Disparities?
From the moment the first report came out more than 10 years ago showing that Black biomedical scientists were about half as likely as White scientists to get funding from the National Institutes of Health (NIH), people started to ask questions.[1] Many of them were directed at the scientists—experts in their field—who are invited by the NIH to review and judge funding applications. Were these experts discriminating against Black applicants? Were the all-powerful gatekeepers of health research in the US, and deciders of many people’s careers, racist?
A group of leaders at the NIH recently put this question to the test. They reasoned that, if reviewers were docking applications from scientists who are Black, these applications would fare better if the name of the applicant and institution where they worked were removed so reviewers didn’t know their identity. Meanwhile, applications from White scientists might be expected to score about the same whether or not identifiable information had been struck from them.
The study was more than just an academic exercise for the NIH. A working group for the agency put out several recommendations in 2020 for possible changes to review criteria that might help reduce bias.[2] De-emphasizing the identity of applicants—by making the review process partially blinded—was one of the possibilities floated, and perhaps the biggest break from the status quo. The changes were also meant to simplify how scientists judge other scientists’ applications, the so-called peer review, to help them focus on the most important criteria, such as whether the proposed research is important. The findings of the study could fuel an overhaul of how the agency evaluates the tens of thousands of applications it gets every year and awards nearly $38 billion (in 2021) to basic biomedical research projects at universities, institutes and small businesses across the US.[3]
For the study, the NIH team took three sets of applications that had been submitted to the NIH in 2014-2015.[4] A set of 400 applications from Black scientists, 400 applications from White scientists that had been randomly picked and another 400 from White scientists that were selected because they were similar to those from Black scientists in terms of area of study, career stage of applicant, amount of NIH funding their institution had and other factors. The team created a second version of each application in which the name, institution and other identifiable information were redacted. Then they asked scientists who had previously served as reviewers for the NIH to score a handful of the applications—some combination of original and redacted versions. They also asked these mock reviewers to guess the race for the applications they read.
The findings “were very informative in terms of bias, but maybe not in the way that even the people who designed it initially expected it to be,” Bruce Reed, PhD, deputy director of the NIH Center for Scientific Review, who oversaw the study, told me.
The main finding of the study, which was reported last year, was that redaction did not change scores for Black scientists, but it did worsen scores for White scientists (both for the sets that were randomly picked and chosen to be similar to the Black group). Because the original (not redacted) White applications were more favorably judged by the reviewers than the original Black applications—even those that they were presumably similar in merit—the ultimate effect was still that redaction brought the scores for the two racial groups closer to each other.
Redaction seemed to throw off reviewers as to the race of the applicant, at least for Black scientists’ applications. In 61% of cases, reviewers guessed incorrectly that redacted Black applications were from White scientists, compared with 36% of the time for original Black applications. In contrast, reviewers only mistook White applicants as Black 2% of the time, either for the redacted or original White applications. Reed thinks this difference makes sense—reviewers tend to guess applications were from White scientists probably because the NIH generally receives many fewer applications from Black scientists.
The fact that redaction still did not give Black scientists’ applications a boost—even when it caused more reviewers to mistake the applicants for White—suggests that reviewers in this study were not basing their assessment on skin color.
What the study does suggest is that redacted versions of White scientists’ applications fared worse than the originals because a lot of information, such as career stage, had been scrubbed from them (along with name and institution). Being more established in their career, working at institutes with more NIH funding, having submitted the funding proposal before and other attributes—which are more common among White applications, as this study and others show—add shine to applications. When applications left out these details—and were pared down to content such as proposed experiments, background and preliminary data—White applicants seemed to take a big hit.
“Race is tied up with a whole lot of other factors, and it’s those other factors that drive the disparities,” Reed said. “It’s wrong to say there is no effect of race, it’s just that it is not literally that the [scientist] is Black or White, it’s all the things that tend to be associated with being Black or White.”
In addition to all these objective measures, such as career stage, Reed thinks another force could explain why redaction hurts White applicants: the halo effect. When a reviewer has a good impression of the applicant or their institution, even if their application is weak, “they give them a break in ways that they wouldn’t for someone that they don’t know,” he said. The study did not address whether reviewers had such reputational bias—such as by asking them if they knew the applicants—but Reed suspects that combining the halo effect with those other advantages could explain all the differences between White and Black applicants’ success.
Although Reed acknowledges that it is human nature to have reputational bias, he said that it is inappropriate for this bias to factor in the judgement of the type of application in this study. These were for so-called R01 grants, which provide hefty funding for a scientist and their lab for about 5 years and that should be based on the ideas, not the person. The halo effect is “not fair because it does not let the junior person, the unknown person, who has got a fabulous idea, in the door," Reed said.
Last year, the NIH started offering training on bias awareness and mitigation to scientists before reviewing applications. Although the training is not required, Reed said that it has been received well so far—reviewers are volunteering to watch, and in surveys, saying it will help them identify and reduce bias.[5] The Center for Scientific Review is tracking the effect of training through these surveys and monitoring reports of bias from reviewers and the NIH staff that oversee groups of reviewers. Nevertheless, numerous studies of bias training have called into question whether it really changes behavior.[6]
The bigger question—whether reviewers should be blind to the identify of applicants—is still being explored. While Reed noted that the study suggests that redaction could potentially shrink some of the gap in scores between Black and White applicants, it is unclear how effective it would be in practice.
As some have pointed out, reviewers would often be able to guess the identity of the applicant.[5] It is true that research proposed in applications can be so hyperspecialized that there are only a few people in the world who have the interest and expertise to carry it out. In the study by Reed and the NIH group, reviewers were able to guess the identity of the applicant for redacted applications 21% of the time.
If the NIH decided to blind peer review, it would have to be the responsibility of the applicant to keep out any identifiable information. Redaction was done after the fact for the study—after applications had gone through the actual review process, but as Reed said, that would be too time-consuming for the reams of applications that the agency receives every year. If applicants knew that reviewers would not see their name, as well as features that could potentially strengthen the evaluation, would they find ways to hint at their pedigree in the application?
To see how redaction plays out in a more real-world setting, the NIH is piloting a partially blinded review process for a type of funding called the NIH Director’s Transformative Research Awards. These awards are for riskier research than the kind proposed in R01s, and the applications are supposed to contain less experimental nitty-gritty and more focus on why the research is innovative. But just like for R01s, the job of reviewers is to focus on the ideas, not the person. In the process being tested, reviewers only find out the identity of the applicant after they have made a preliminary judgment of the research. Although reviewers can then change their judgment, the hope is that they would be more aware of why they were changing the score and whether it was because of bias. Results for this next test of blinded peer review should come in about a year.
[1] D.K. Ginther et al., RACE, ETHNICITY, AND NIH RESEARCH AWARDS. Science. 333, 1015–1019 (2011)
[3] https://www.nih.gov/about-nih/what-we-do/budget
[4] R.K. Nakamura et al., An experimental test of the effects of redacting grant applicant identifiers on peer review outcomes. eLife. 10:e71368 (2021)
[6] M.A. Taffe, N.W. Gilpin, Equity, Diversity and Inclusion: Racial inequity in grant funding from the US National Institutes of Health. eLife. 10,e65697 (2021)