A wide array of federal mandates have a profound impact on the use of racial and ethnic categories in biomedical research, clinical practice, product development, and health policy. Current discussions over the appropriate use of racial and ethnic categories in biomedical contexts have largely focused on the practices of individual researchers.
By contrast, our discussion focuses on relations between the daily practices of biomedical professionals and federal regulatory mandates. It draws upon the legal doctrine of equal protection to move beyond such debates and to propose guidelines to address the structural forces imposed by federal regulations that mandate how data about race and ethnicity are used in biomedical research. It offers a framework to manage the tension involved in using existing federally mandated categories of race and ethnicity alongside new scientific findings about human genetic variation.
CURRENT DISCUSSIONS ABOUT the appropriate use of racial and ethnic categories in biomedical contexts have largely focused on the practices of individual researchers. Individual research, however, takes place within larger structural contexts that shape how and when such categories get taken up, circulated, and applied. In particular, more consideration needs to be given to the impact federal regulatory mandates and incentives upon how biomedical professionals use racial and ethnic categories. Prominent among these mandates are requirements to use the social categories of race and ethnicity provided by the Office of Management and Budget for the collection of data for publicly funded research. Use of such social categories are heading for a collision with diverse categories of population that are classified in federally maintained genetic data bases. As genetic information becomes increasingly central to an ever-widening array of biomedical enterprises, the danger of improperly confusing or conflating social categories of race and ethnicity with genetic categories of population rises accordingly. Drawing analogies to the legal doctrine of equal protection, we offer a preliminary framework to begin discussion on how best to manage or avoid such collisions.
The recent Food and Drug Administration (FDA) approval of the drug BiDil with a race-specific indication to treat heart failure only in African Americans has brought to the fore a host of issues related to the use of racial and ethnic categories in biomedical research and drug development.1 Because the BiDil application was premised on the activity of the drug at the molecular level in the trial subjects, the FDA approval has, in effect, given the imprimatur of the federal government to the use of race as a biological category.2 Ironically, the FDA approval was based on a trial—the African-American Heart Failure Trial (A-HeFT)—that enrolled only self-identified African Americans. The results of this single-race design therefore precluded the investigators from making any claims regarding whether BiDil works differently in self-identified African Americans than in anyone else.3
The race-specific design of the A-HeFT trial is inextricably linked to the fact that its sponsors obtained a race-specific patent in 2000 for the use of BiDil in African Americans.4 In granting the patent, the US Patent and Trademark Office provided an additional federal stamp of approval on the implicit use of race as a biological category. The federally granted patent also provided a powerful commercial incentive for the race-specific design of A-HeFT.2
The story of BiDil is significant because it marks the first race-specific application to the FDA. More broadly, it brings into high relief a powerful dynamic whereby federal regulatory incentives and directives promote the increasing use of racial and ethnic categories in a biomedical context. In the case of efforts to address well-documented disparities in health outcomes, such use, although complicated, does not necessarily imply a biological or genetic difference between races.5 In the context of seeking the causal molecular basis for certain diseases, as in much drug development, the use of racial and ethnic categories as surrogates for genetic markers presents more problematic issues. Some researchers believe correlations between racial/ ethnic and genetic categories can serve as useful research tools6,7; others contest the rigor and utility of such purported correlations, arguing that they risk naturalizing race and ethnicity as somehow genetic.8–10 When federal approval is sought for such uses, the power of the state becomes implicated in marking racial or ethnic differences as genetic.
Over the past several years, recurring controversies have arisen among scientists and biomedical professionals regarding the nature of the relation, if any, between genes and race.11 A host of articles has been published in the attempt to help biomedical researchers clarify their use of the concepts of race and ethnicity in general9,12,13; some specifically relate to genetically based concepts of population.14–16 Several biomedical journals have published policy statements or guidelines concerning the use of racial and ethnic categories.17–19 These articles and related debates over how, when, or whether to use race and ethnicity in biomedical research are targeted at the practices of researchers themselves.
To date, however, such articles have largely overlooked the fact that research practices involving the use of racial and ethnic categories are profoundly shaped by federal regulatory incentives and guidelines. Thus, before proceeding with further debate about their own scientific practices, biomedical researchers and clinicians need to consider more fully and systematically the role of the federal government in shaping such practices. The recent proliferation of biomedical research that uses race and ethnicity as variables did not spontaneously emerge from a sudden discovery of their relevance. Rather, from funding requests to drug approval and market protection, specific federal initiatives mandating the use of such categories have played a critical role in promoting their inclusion as variables in biomedical research.
Prominent among these federal mandates are the National Institutes of Health (NIH) Revitalization Act of 1993, which directed the NIH to develop guidelines for including women and minorities in NIH-sponsored clinical research,20 and the Food and Drug Modernization Act of 1997, which directed the FDA to examine issues related to the inclusion of racial and ethnic groups in clinical trials of new drugs.21 Pursuant to these mandates, the NIH and FDA have issued detailed guidelines and guidance mandating certain procedures and practices concerning the inclusion of ethnic and racial minorities in clinical trials.22,23
Thus, for example, the NIH “Policy on Reporting Race and Ethnicity Data” states, inter alia, that the “NIH requires all grants, contracts, and intramural projects conducting clinical research to address the Inclusion of Women and Minorities. . . . Investigators are instructed to provide plans for the total number of subjects proposed for the study and to provide the distribution by ethnic/racial categories and sex/gender.”24 Similarly, the FDA recommended that individuals or corporations submitting drug approval applications “collect race and ethnicity data for clinical study participants.”25 These mandates impose significant requirements and provide incentives to identify and collect research data according to categories of race and ethnicity.
The federally mandated racial and ethnic categories, however, are not biomedical in origin. Rather, they derive from the 1997 “Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity” published by the Office of Management and Budget (OMB).26 These standards set forth 5 minimum categories for data on race: American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White. There are 2 categories for data on ethnicity: Hispanic or Latino and Not Hispanic or Latino. These categories provide the basis for the classification of all federal data on race and ethnicity, most notably, the census.
The OMB standards, however, contain an important caveat: “The racial and ethnic categories set forth in the standards should not be interpreted as being primarily biological or genetic in reference.”26 These categories were developed to serve social, cultural, and political purposes. When the federal government requires biomedical researchers and clinicians to import these social categories into explicitly biological or genetic contexts, it is creating a structural situation in which social categories of race and ethnicity may easily become confused and may be conflated with biological and genetic categories in day-to-day practice.
Since the advent of the federally sponsored Human Genome Project in 1990, increasing knowledge of genetics has been transforming biomedical research. This research, however, often involves protocols that have been designed in response to federal mandates to incorporate the social categories of race and ethnicity defined by the OMB. The protocols compel researchers and clinicians to juggle genetic categories alongside racial and ethnic categories in the same conceptual and physical space. This creates a situation that facilitates and even promotes the conflation of genetic categories of population with social categories of race and ethnicity—it is an accident waiting to happen.
Already existing federally supported genetic databases add to the confusion through their own problematic uses of racial and ethnic categories. Thus, for example, the National Institute of General Medical Science/Coriell Cell Repository maintains a Human Variation Collection of genetic samples organized into the following broad categories: North America/Caribbean, South America, Europe, Asia/Pacific, Africa, and Middle East. Within these broad categories, subdivisions are made with diverse and potentially inconsistent classifications that include White, Basque, Mexican American Community of Los Angeles, Southeast Asians (excluding Japanese and Chinese), Quechua—South Central Andes of Peru, Africans South of the Sahara, Ashkenazi Jews, Czechoslovakian, and Northern European.
Implicated in these various categories are sometimes-overlapping concepts of ethnicity, race, continental geography, regional geography, geopolitical nation-states, urban ethnic communities, religion, geographic isolation, and endogamous indigenous populations.27 (The contingency of the Czechoslovakian category is particularly notable, because there is no longer a geopolitical entity known as Czechoslovakia.) The National Center for Biotechnology Information maintains a separate database of genetic information known as dbSNP, which similarly organizes its data into population classes that mix geography, nationality, race, and ethnicity.28
A major new federal initiative, the International Haplotype Map Project,29,30 promises to exacerbate this problem. The project has devoted more than $100 million to charting blocks of genetic variation in the human genome.31 This otherwise-laudable effort, intended to help researchers identify genetic variations related to health and disease, may inadvertently be opening the door to further confusion of racial and ethnic categories with genetic groupings. The initial phase of the project has been structured around 270 tissue samples taken from Yorubas in Nigeria, Japanese, Han Chinese, and individuals of western and northern European descent in the United States. The resulting blocks of variation are being identified with their source population.32 The population groups are already being characterized as representative of the broad continental population groups of Africa, Asia, and Europe.32 The stated rationale is that although “most of the common haplotypes occur in all human populations . . . their frequencies differ among populations. Therefore, data from several populations are needed to choose tag SNPs [single nucleotide polymorphisms].”32 One can readily see how such genetic categories are ripe for conflation with the social/bureaucratic categories of race and ethnicity promulgated by the OMB.
As population-identified genetic information increasingly comes online for use from the Haplotype Map Project and other federally maintained databases, the need to provide a structuring mechanism to keep genetic categories in a socially responsible and scientifically appropriate relation to social categories of race and ethnicity will become ever more pressing.
The various attempts to provide guidance to researchers on how, when, and whether to use race and ethnicity in their work are important, but they are not enough. Researchers can and should be able to decide how they choose to pursue their particular research agendas. But for years now, researchers and clinicians have been working under a variety of federal mandates that influence how, when, and whether they use racial and ethnic categories in their work. The time has come to examine those mandates and focus on them—rather than on the researchers and clinicians—as targets for constructive intervention.
Previous attempts to articulate best practices for using racial and ethnic categories in biomedical research and clinical practice have largely involved discussions among social scientists, natural scientists, and medical professionals about how best to characterize and manage the social and scientific meaning of these categories. Largely absent from these considerations, however, has been an alternative approach with a long tradition of assessing how best to characterize and manage such classifications in a regulatory context: equal protection law.
Equal protection doctrine derives from the 14th Amendment to the US Constitution, which declares that “no State shall make or enforce any law which shall deny to any person within its jurisdiction the equal protection of the laws.” Equal protection doctrine is used to evaluate state-mandated use of racial categories in areas such as school desegregation and affirmative action. Although this doctrine is not necessarily directly applicable to the context of federal practice guidelines or regulatory approvals,33,34 over the decades, courts and legal commentators have devoted considerable attention to developing guidelines and standards to assess and evaluate how racial and ethnic categories may be used appropriately to achieve specific goals. Under equal protection doctrine, race is considered to be a suspect classification because of a US history of racial oppression and the structural vulnerability of racial minority groups. Therefore, the state must justify the use of a racial classification by demonstrating that the classification is “narrowly tailored to serve a compelling state interest.”35 This is called strict scrutiny. It requires a tight fit between the classification and the purpose or interest it serves to force out potentially invidious motivations behind the use of race in law.
Concepts from equal protection analysis may be adapted to a biomedical context through comparison with biomedical analogues already in use in federal regulation of racial and ethnic classifications in research and clinical practice. Thus, for example, NIH guidelines for grant applicants and contract solicitations already require the inclusion of “a description of plans to conduct analyses to detect significant differences in intervention effect by sex/gender, racial/ethnic groups, and relevant subpopulations, if applicable [italics added].”36 The guidelines go on to define significant difference as “a difference that is of clinical or public health importance, based on substantial scientific data [italics added].”36 Similarly, the guidelines require such submissions to “include a description of plans to conduct valid analysis by sex/gender, racial/ethnic groups, and relevant subpopulations, if applicable [italics added].”36
Significant difference and valid analysis, like the equal protection standard of “narrow tailoring to serve a compelling state interest,” involve terms of art that have been used constructively to manage racial and ethnic categories in diverse contexts. They have been defined over time and applied through an accretion of understanding, practice, and interpretation developed by the relevant professional communities. The model of equal protection analysis can be adapted to a biomedical context by using analogous concepts such as significant difference and valid analysis to evaluate the rigor and legitimacy of uses of racial/ethnic classifications in relation to genetics.
Equal protection doctrine thus provides a useful model for developing guidelines to improve already existing and pervasive federal mandates governing the management of race and ethnicity in regulatory contexts. In addition to exposing possible invidious motives, heightened scrutiny can bring to light well-intentioned but careless or inconsistent use of racial and ethnic classifications.
To this end, I offer the following preliminary recommendations to consider in revising relevant federal mandates to address the use of race and ethnicity in biomedical research and clinical practice. They are organized sequentially to parallel a general research plan of project conceptualization, design, and implementation. These recommendations might be thought of as a regulatory analogue to the sort of guidelines on the use of racial and ethnic categories currently being considered and adopted by some biomedical and scientific journals. They attempt to adapt or transpose the conceptual apparatus of equal protection law into the domain of biomedical research and clinical practice. I hope that they will provide the groundwork for further discussion of how federal mandates might be revised to help biomedical professionals keep genetic categories of population and social categories of race and ethnicity in a constructive relation to one another.
Federal regulations, mandates, guidelines, or other similar directives relating to federal funding, regulatory approval, or intellectual property protection for biomedical research and related products should be revised to require applications and related documents submitted to federal agencies that use or make claims on the basis of racial or ethnic categories to include the following:
Require a clear definition of the source of any population category being used, its scope, and its limits. Specify whether or to what extent shared biology or genetics is presumed to underlie the population classification chosen and the degree to which the classification also implicates nonbiological values (e.g., nationality, race/ethnicity, religion, mere proximity).
Require a clear recognition of the requirements of the OMB revised standards regarding the selection and use of racial/ethnic categories and an explicit statement of the social basis of those categories. This may take the form of including the OMB caveat: “The racial and ethnic categories set forth in the standards should not be interpreted as being primarily biological or genetic in reference.”
The OMB revised standards establish basic categories of race and ethnicity, but they do not dictate specifically how those categories are to be used or interpreted in different contexts. Thus, in practice these categories are often merely starting points and are often elaborated upon and modified. The requirement of definition allows researchers and clinicians to adapt these categories to their own particular needs. It also ensures that from the outset such adaptation does not involve an inadvertent or inappropriate conflation of social categories of race and ethnicity with genetic population groupings.
Require articulation of the rationale for the particular population grouping(s) being used. Require articulation of the relation between the actual sample being used and the population category in which it is being placed. Specifically require articulation of the nature or degree of representativeness being asserted for the sample in relation to the population category chosen.
For genetically defined population categories, require clarification of the justification for any concurrent use of nonbiological values, such as geopolitical nation-state boundaries or cultural groupings, to specify the location of descent populations. Nation-states may used to describe geographic regions of the world from which certain populations recently descended, but such correlations must be justified and refined to clarify discontinuities between the nation-state boundaries and relevant geographic regions.
Require articulation of and justification for any relation asserted between any population-based genetic categories and any racial/ethnic categories. In particular, where appropriate, require articulation of whether race is being used as a risk factor or as a risk marker for a particular biomedical condition.
The federal mandates create a powerful incentive for using racial and ethnic groupings to structure data and research or trial design. Once population groups and racial/ ethnic groups are defined, it is important to require a clear articulation of how and why such categories are used in the trial or research project.
Particular problems may arise where a relatively small sample size comes to stand as a proxy for successively larger groups. Thus, for example, the Haplotype Map Project sample of 45 Han Chinese in Beijing (a geographically situated ethnic group) may come to stand for all Chinese people (a historical geopolitical group) and then for all Asians (a continental group). Indeed, this has already occurred: the International Haplotype Map Project Consortium itself has referred to these samples as simply being from a part of Asia.30 This type of sequential expansion of correlation should be explicitly justified at each step.
Require a tight fit (a) between the population, racial/ethnic, and genetic categories being used and (b) between the genetic category identified and the disease state/ health issue or other biological activity being analyzed.
The tightness of the fit may be assessed by considering whether the relation is based (a) on a significant difference (or identity) between the racial/ethnic and genetic categories used and (b) on a valid analysis that connects both the relevant genetic category and its racial/ethnic correlate to the identified disease state or other biomedical condition.
Where race or ethnicity is being used as a risk factor, require a tight fit between the aspect of race or ethnicity identified as a risk factor and causal aspects of the condition.
Where race is being used as a risk marker, require an explicit articulation of the nature of the correlation asserted between the marker and the identified condition. Require the specification that such a correlation does not speak to underlying causal aspects of the condition.
One of the most imposing challenges in using racial and ethnic categories in biomedical contexts is preventing a sort of conceptual slippage that occurs through the elaboration of excessively attenuated relations between racial and ethnic categories and purported biological and genetic correlates. Harking back to the example of the 45 Han Chinese who come to stand for all of Asia, imagine further that this group of 45 is identified as having a particular frequency of a specific genetic marker that correlates with a higher likelihood of having a particular genetic variation, which in turn further correlates with a higher likelihood of contracting a particular disease at some unspecified time in the future. This disease, in turn, may have multiple causes and be manifested in various forms with differing degrees of severity. This attenuated correlation becomes even more problematic when one realizes that the initial OMB-defined category of race itself is not tightly bounded in a social context but involves the use of proxy markers and historically contingent conceptions of racial identity that have changed substantially over time.37
It should also be noted that when differential health outcomes are being studied, the fit between racial/ethnic categories and biology will tend naturally to be very tight. For example, in health-disparities research on the biomedical impact of differential access to medical care among specified African American, Hispanic, Asian, or White populations, the fit between racial/ ethnic categories and the biological health outcomes would be one of almost perfect identity.
Issues of fit will become more central in assessing projects that use race and ethnicity as proxies to uncover purported underlying genetic causes of disease.
Require a substantial health or scientific interest to be furthered by the use of racial or ethnic categorization in this context.
The diverse federal mandates requiring the organization of data by race and ethnicity create an incentive to use the data thereby produced—whether or not they are directly relevant to the project at hand. Requiring the furtherance of a substantial health or scientific interest ensures that correlations between racial and ethnic categories and genetic categories will not be asserted post hoc with minimal justification. The standard of substantial interest is somewhat less rigorous than the compelling interest required under equal protection law. The rationale here is to recognize that biomedical research and clinical practice generally use racial and ethnic classifications for benign purposes.
These requirements must be met for each use of racial and ethnic categories throughout the relevant project or practice.
This is another deterrent to slippage. One common pitfall of existing approaches to using racial and ethnic categories in biomedical contexts is that researchers and clinicians may issue a sort of general disclaimer up front about race and ethnicity being social categories but then proceed through the rest of the project to treat them as, in effect, primarily biological or genetic.
If a researcher is unable to meet these requirements because of an inability to disentangle what are perceived to be complexly intertwined social/ genetic/biological variables or categories, the application and related documents may still be submitted to the relevant federal agency if the researcher provides an explanation and prominently incorporates the OMB caveat that “the racial and ethnic categories set forth in the standards [or application] should not be interpreted as being primarily biological or genetic in reference.”26
As a practical matter, individual researchers may find it difficult, given the design or nature of their projects, to break racial/ ethnic and genetic population categories into their social, genetic, and nongenetic biological components. This is a major undertaking, but it is also necessary. These guidelines provide incentives to work out these issues, whereas the caveat allows researchers to proceed with their projects in a more deliberate manner while this difficult work progresses.
These recommendations are primarily procedural in nature. They preserve scientific autonomy and allow researchers and clinicians to define and act on their own conceptions of the relevance of the OMB categories of race and ethnicity to their own work. They would apply only to applications and other related documents submitted to the federal government.
Race and ethnicity are powerful categories. They have important roles to play in understanding a wide array of health-related phenomena. They must, however, be used with care. There are significant differences between using such categories to identify disparities in health outcomes and using them as proxies to try to identify underlying genetic causes of disease. I hope that these guidelines will promote more consistent and scientifically rigorous articulation, clarification, and application of these categories when applications and related documents are submitted to relevant federal agencies.
This work was supported by National Human Genome Research Institute (grant R01 HG002818–01).
Thanks to the participants in the grant working group meetings and others for their helpful comments and suggestions: Donna Arnett, Guillermo Aviles-Mendoza, Rene Bowser, Rose Brewer, Ellen Clayton, Colin Campbell, Troy Duster, Phyllis Epps, Kim Fortun, Morris Foster, Jeffrey Kahn, Vivek Kapur, Richard King, Sandra Lee, Teri Manolio, Jonathan Marks, Michael Omi, Harry Orr, Susan Parry, Dorothy Roberts, Charmaine Royal, William Toscano, Rebecca Trotsky-Sirr, Karen-Sue Taussig, Samuel Wilson, Susan Wolf. Thanks also to the members of the NHGRI Human Genetics Variation Consortium for their comments on a related presentation.
Human Participant Protection No human participants were involved in this study.