© 2003 American Public Health Association
The authors are with the Center for Health Quality, Outcomes and Economic Research (a Veterans Affairs Health Services Research and Development National Center of Excellence), Bedford VA Medical Center, Bedford, Mass, and the Health Services Department, Boston University School of Public Health, Boston, Mass. Correspondence: Requests for reprints should be sent to Nancy R. Kressin, PhD, Center for Health Quality, Outcomes and Economic Research, VA Medical Center, 200 Springs Rd, Building 70 (152), Bedford, MA 01730 (e-mail: nkressin{at}bu.edu).
Objectives. We examined agreement of administrative data with self-reported race/ethnicity and identified correlates of agreement. Methods. We used Veterans Affairs administrative data and VA 1999 Large Health survey race/ethnicity data. Results. Relatively low rates of agreement (approximately 60%) between data sources were largely the result of administrative data from patients whose race/ethnicity was unknown, with least agreement for Native American, Asian, and Pacific Islander patients. After exclusion of patients with missing race/ethnicity, agreement improved except for Native Americans. Agreement did not increase substantially after inclusion of data from individuals indicating multiple race/ethnicities. Patients for whom there was better agreement between data sources tended to be less educated, nonsolitary living, younger, and White; to have sufficient food; and to use more inpatient Department of Veterans Affairs (VA) care. Conclusions. Better reporting of race/ethnicity data will improve agreement between data sources. Previous studies using VA administrative data may have underestimated racial disparities.
The growing interest in racial disparities in the provision of health care has fostered an increase in the use of race/ethnicity data derived from administrative data files. Despite the increasing demand for and use of these data, their reliability has been examined in only a few studies. This omission is a significant one, because the reliability of racial designations is crucial for accurate estimation of racial disparities in health care. A few previous studies have examined the reliability of racial classifications in administrative data from specific states, the federal government, and national insurance programs. Blustein documented that racial classifications for patients with multiple admissions in hospital discharge data in New York state lacked reliability, especially for nonAfrican American and non-White racial categories.1 When California birth certificate race/ethnicity data were compared with race/ethnicity information obtained by interview, Baumeister et al. found that the sensitivity of the birth certificate data was significantly lower for Native Americans.2 In a review of vital statistics on race and ethnicity, Hahn and colleagues noted inconsistencies between birth and death records of infants, especially for Hispanic persons and for races/ethnicities other than White and African American.3 Pan and colleagues compared racial designations in Medicare and Medicaid data, finding significant amounts of contradictory information on race/ethnicity between the programs, with the greatest discrepancies for Hispanic, "other," and Asian classifications.4 Boehmer and colleagues documented that study outcomes differed markedly depending on whether the source of race/ethnicity information was Department of Veterans Affairs (VA) administrative data or self-report data. Specifically, additional race/ethnicity differences in the use of tooth extraction versus root canal therapy were found when self-report data were used.5 They also noted discrepancies in race/ethnicity classifications between the data sources. Because a number of recent studies on racial variations in cardiac care have been based on VA databases,610 understanding the accuracy of these data is especially important. One study examined the concordance between medical record data on race/ethnicity in the VA and race/ethnicity as recorded in inpatient administrative data files, finding good agreement.11 However, this finding is not surprising, given that medical record data serve as the source for inpatient data on race/ethnicity. The agreement of the administrative files with patient self-report was unknown, as were the sociodemographic and health factors associated with such agreement. The purpose of this study was to extend previous research by examining the agreement of VA administrative data on race/ethnicity with patient self-reported race/ethnicity, using information obtained from the largest federal survey ever conducted in the Veterans Health Administration.12 Thus, in addition to examining general rates of agreement, we assessed the effect of including or excluding patients with missing race/ethnicity information or with multiple race/ethnicity designations in the survey data. A secondary goal of our study was to identify any sociodemographic and health characteristics of patients associated with agreement between data sources.
Data Sources VA administrative files. The VA maintains administrative files on inpatient and outpatient care received by veterans for each fiscal year. The Patient Treatment File (PTF) includes patient-level information about short-term discharges from VA inpatient care, including demographic and summary information about each episode of care. The Outpatient Clinic File (OPC) provides information on each outpatient visit in the VA and is organized by visit day. This file contains information about each patient, including sociodemographic characteristics. From the PTF and OPC files for 1996 to 1998, we created a single file with records for every veteran patient who received inpatient or outpatient care provided or paid for by the VA in the 3 years preceding administration of the survey described below. 1999 Large Health Survey of Veteran Enrollees. In 1999, the largest and most detailed survey of veterans using VA health services was conducted to ascertain their health status and health practices.12 Patients were sampled from the March 1999 enrollment file, which contained 3 760 200 enrollees. After exclusion of 146 323 veterans who had died or who had "bad" addresses, the final mailable sampling frame was 3 613 877. A total of 1 406 049 enrollees were sent surveys by mail using a stratified random sample (those who died or who were ineligible because of bad addresses were excluded), and a total of 887 775 surveys were received, resulting in a response rate of 63%. These surveys included questions about the patients race/ethnicity, other basic sociodemographic characteristics, and health. We excluded patients whose race/ethnicity was missing, and for most analyses, we excluded patients who indicated more than 1 race/ethnicity, leaving an analysis sample of 730 149.
Measures Large Survey data on race/ethnicity. Our measurement of race/ethnicity was the patients self-reported race/ethnicity provided in the 1999 Large Health Survey of Veteran Enrollees in response to a question developed by the Office of Management and Budget for use in federal surveys.13 Patients were asked to indicate their race/ethnicity in response to the following single question, "What is your race/ethnicity?" Patients were instructed to mark all responses that applied, including "American Indian or Alaskan native," "Asian," "Black or African American," "Spanish, Hispanic, or Latino," "Native Hawaiian or other Pacific Islander," or "White." We excluded patients from our sample whose race/ethnicity was missing from this file (n = 115 349; 13.1% of the original sample). Patients who had indicated more than 1 race/ethnicity (n = 34 113; 3.8% of the original sample) were also excluded from some analyses, leaving a sample of 730 149 veterans. Sociodemographic characteristics. Age (in 1998) was taken from the administrative data. Patients were asked to report their educational level by selecting 1 of the following responses: "never attended school or only kindergarten," "grades 1 through 8," "grades 9 through 11," "grades 12 or GED (general equivalency diploma)," "college 1 year to 3 years," or "college graduate or graduate school." As in the Behavioral Risk Factor Surveillance System questionnaire, patients were asked, "In the past 30 days have you been concerned about having enough food for you or your family?" (yes or no) to determine sufficiency of food.14 For health status, we used a single item that assesses general health, drawn from the Veterans Short Form (SF)-36 ("In general, would you say your health is excellent, very good, good, fair or poor?").15 Patients were asked whether they lived alone (yes or no) and to indicate whether they were married, divorced, separated, widowed, or never married. For the purposes of our analyses, we dichotomized this variable, creating 2 groups: currently married and other. For employment status, patients were asked whether they were currently employed for wages, self-employed, looking for work and unemployed for more than 1 year, looking for work and unemployed for less than a year, retired, homemaker, student, or unable to work. These responses were grouped as employed, retired, unemployed, or "other." For some analyses, these categories were further subdivided as employed or not employed. Under health care utilization, we calculated total number of inpatient stays between 1996 and 1998 and total number of outpatient visits between 1996 and 1998 from the administrative data.
Analyses
Sociodemographic characteristics of the sample are shown in Table 1
We examined agreement in the racial designations between the 2 files, as shown in Table 2
Race/ethnicity was designated as unknown in the administrative files for 36% of patients. We therefore calculated percentage agreement when the veterans for whom race/ethnicity was unknown in the administrative files were excluded. As shown in the right-hand columns of Table 2
Patients with more than 1 self-reported race/ethnicity presented more opportunities for concordant classification with the administrative data (e.g., a man who considers himself both White and Native American had 2 chances for being administratively classified in a category consistent with 1 of his selfdesignations). Thus, we conducted an additional analysis to examine concordant classifications between all patients with 1 or 2 self-designations of race/ethnicity (99.6% of the sample) and the administrative records of race/ethnicity. As shown in Table 3
Next, we examined factors associated with agreement between types of records (results not shown in tables). Again focusing on patients with only 1 racial designation, we conducted bivariate comparisons of patients for whom there was agreement between the administrative and survey data with patients for whom there was not on sociodemographic, health, and health care utilization variables. We included patients whose race/ethnicity was unknown in the administrative data. Compared with patients for whom there was no agreement, patients for whom there was agreement regarding race/ethnicity were less likely to have more than a high school education (29% vs 21%), less likely to be employed (22% vs 35%), more likely to indicate that having sufficient food was a problem (15% vs 12%), more likely to live alone (25% vs 20%), and less likely to be married (59% vs 66%; all P < .0001, using 2 tests). Patients with agreement regarding race/ethnicity had worse self-perceived general health (3.68 vs 3.29; a higher score indicates worse health), more inpatient stays (0.80 vs 0.07), and more outpatient visits (37.2 vs 13.6; all P < .0001, using t tests).
Finally, we conducted multivariate logistic regression analyses to examine the unique association of sociodemographic and health care utilization variables with known race in the administrative data, and then to examine the association of these variables with agreement on race/ethnicity between administrative and self-report data (Table 4
The purpose of this study was to examine agreement between administrative data on race/ethnicity in VA data files and patients self-reported race/ethnicity and to identify sociodemographic and health characteristics of patients who have agreement on race/ethnicity between the 2 data files. Our results indicated fairly poor overall agreement between administrative data and self-reported race/ethnicity; the best rates of agreement (for Whites and African Americans) were approximately only 60%, and rates of agreement were even lower for other racial groups. We found very high levels of missing or unknown race/ethnicity in the administrative data files, and not surprisingly, when we deleted patients for whom race/ethnicity was unknown, we noted markedly higher rates of agreement between the administrative and self-report data. When VA claimed to know the race/ethnicity, the data agreed with the patients self-report over 90% of the time for Whites and African Americans, over 80% of the time for Hispanics, and about 70% of the time for Asians and Pacific Islanders, but still only about 20% of the time for Native Americans. Thus, a strategy the VA may consider to enhance the agreement of the administrative race/ethnicity data with self-report data is to decrease levels of patients with unknown race/ethnicity by either supplementing data from the Large Survey data files or by embarking on new efforts to gather information on race/ethnicity from VA patients for whom race/ethnicity is currently unknown. Indeed, a new policy on the collection of race data implemented by VA in 2003 will improve the quality of newly collected data, because the policy makes the completion of both race and ethnicity data fields mandatory and includes an indicator for specifying how the determination was made (visual or self-identification). One other possible strategy for improving the concordance between VA and selfreported race/ethnicity data is to consider patients multiple races/ethnicities when making administrative classifications (e.g., allowing patients to indicate more than 1 race/ethnicity, as is done on the US Census), thereby allowing more opportunities for administrative designations to match patients self-designations. Our analyses showed that this approach would improve agreement when patients considered themselves combinations of Hispanic, White, or African American, but not when other combinations of races/ethnicities are involved. However, given that these 3 combinations of racial classifications accounted for only 0.06% of our total sample, this strategy is unlikely to have a large effect on the overall quality of the administrative race/ethnicity data. In the bivariate results, the agreement between VA race/ethnicity data and self-report was similar for White, African American, and Hispanic patients and notably lower for Asian, Pacific Islander, and Native American patients. However, the logistic regression results indicated that after control for a variety of sociodemographic factors, African Americans and "others" (including Hispanic, Asian, Pacific Islander, and Native American patients) were less likely than Whites to have agreement between the data sources. This implies that studies examining racial disparities in health care within the VA are particularly likely to have poor agreement with selfreported race/ethnicity data for non-White patients. However, because African Americans were described as White almost 5% of the time, whereas Whites were described as African American only 0.44% of the time, estimates of racial disparities between these 2 groups are likely to be diminished owing to the characteristics of the administrative database. Similarly, Hispanic patients were listed as White almost 11% of the time, whereas Whites were listed as "other" less than 1% of the time; estimates of disparities between these 2 groups are also likely attenuated by the classification as White of a significant proportion of Hispanic patients in the administrative data. To illustrate this point, consider a prominent VA study on racial disparities in cardiac care. Whittle and colleagues6 observed significantly different crude rates of cardiac catheterization, 19.3% for Whites and 11.8% for African Americans; they observed rates of 1.8% and 0.8%, respectively, for percutaneous transluminal coronary angioplasty; and they observed rates of 5.0% and 1.6%, respectively, for coronary artery bypass grafting. If African Americans were misclassified as White 5% of the time and Whites were misclassified as African American 0.44% of the time, as in our findings, correct identification of race/ethnicity may actually have inflated the procedure rates for Whites and reduced them for African Americans. This assumes that the African Americans misclassified as Whites had rates of invasive procedures similar to those of other African Americans. Consequently, use of self-reports of veterans race/ethnicity could actually have magnified the racial differences in procedure use observed by Whittle et al. Our results highlight important implications of the quality of VA data on race/ethnicity for past and present research findings. These results extend those in the previous literature because of their focus on administrative data from the VAs national databases. This focus is an important addition to the field, as so many studies of health and health disparities rely on the VAs data on race/ethnicity.68,10 The results also extend those of Boehmer et al.5 by detailing the effects on rates of agreement of excluding patients with unknown race, as well as by elucidating the sociodemographic and health factors associated with available race data and with agreement between self-reported and administrative data on race. This study was limited by its reliance on a summary file of VA administrative data on race/ethnicity, which grouped individuals of Hispanic, Pacific Islander, Asian, and Native American race/ethnicity together. Because all these groups were included in the "other" category, the level of detail originally available in the VA databases for each racial category was eliminated in this file. Thus, we could not report levels of agreement with self-reported race/ethnicity within each of these 4 categories. However, the benefit of using this file is that the file summarizes 3 years of administrative data on race/ethnicity, as opposed to containing data from a single year. Our exploration of factors associated with agreement between self-reported race/ethnicity and race/ethnicity in the administrative data files indicated that younger, lesseducated patients who possess some social and material resources and who use VA inpatient care more often are likely to have higher levels of agreement between race/ethnicity information in the 2 databases we studied. In contrast, patients whose administrative data is most likely to include known race have consistently fewer social and economic resources, report worse health, and use more inpatient and outpatient VA care. Other studies have noted that users of VA care are more likely to have a low family income and low labor force participation and are less likely to have a family physician or private health insurance.16 Our results suggest that even within this socioeconomically challenged population, those using the system more often are more likely to have agreement between self-report and administratively classified race/ethnicity. Thus, results indicate that the more opportunities the VA has to record race/ethnicity, the more likely its data are to agree with patient self-reports.
Funding for this work was provided in part by the Department of Veterans Affairs Health Services Research and Development Service project ECV 97022.2 (N. Kressin, principal investigator), and by the Office of Quality and Performance, Department of Veterans Affairs Headquarters. Drs Kressin and Kazis are recipients of Health Services Research and Development Service Research Career Scientist Awards. We gratefully acknowledge the programming assistance of Arkadiy Pitman, Megan Amuan, and Michelle Orner.
Human Participation Protection
Note. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
Contributors Accepted for publication May 30, 2003.
1. Blustein J. The reliability of racial classifications in hospital discharge abstract data. Am J Public Health. 1994;84:10181021. 2. Baumeister L, Marchi K, Pearl M, Williams R, Braveman P. The validity of information of "race" and "Hispanic ethnicity" in California birth certificate data. Health Serv Res. 2000;35:869883.[Web of Science][Medline]
3. Hahn RA, Mulinare J, Teutsch SM. Inconsistencies in coding of race and ethnicity between birth and death in US infants: a new look at infant mortality, 1983 through 1985. JAMA. 1992;267:259263. 4. Pan CX, Glynn RJ, Mogun H, Choodnovskiy I, Avom J. Definition of race and ethnicity in older people in Medicare and Medicaid. J Am Geriatr Soc. 1999;47:730733.[Web of Science][Medline]
5. Boehmer U, Kressin N, Berlowitz D, Christiansen C, Kazis L, Jones J. Self-reported vs administrative race/ethnicity data and study results. Am J Public Health. 2002;92:14711473.
6. Whittle J, Conigliaro J, Good CB, Lofgren RP. Racial differences in the use of invasive cardiovascular procedures in the Department of Veterans Affairs medical system. N Engl J Med. 1993;329:621627.
7. Peterson ED, Wright SM, Daley J, Thibault GE. Racial variation in cardiac procedure use and survival following acute myocardial infarction in the Department of Veterans Affairs. J Am Med Assoc. 1994;271:11751180. 8. Mirvis DM, Burns R, Gaschen L, Cloar FT, Graney M. Variation in utilization of cardiac procedures in the Department of Veterans Affairs health care system: effect of race. J Am Coll Cardiol. 1994;24:12971304.[Abstract] 9. Mickelson JK, Blum CM, Geraci JM. Acute myocardial infarction: clinical characteristics, management and outcome in a metropolitan Veterans Affairs Medical Center teaching hospital. J Am Coll Cardiol. 1997;29:915925.[Abstract] 10. Mirvis DM, Graney MJ. Variations in the use of cardiac procedures in the Veterans Health Administration. Am Heart J. 1999;137:706713.[Web of Science][Medline] 11. Kashner TM. Agreement between administrative files and written medical records: a case of the Department of Veterans Affairs. Med Care. 1998;36:13241336.[Web of Science][Medline] 12. Kazis L. Health Status and Outcomes of Veterans: Physical and Mental Component Summary Scores Veterans SF-36. 1999 Large Health Survey of Veteran Enrollees. Executive Report. Washington, DC; Bedford, MA: Department of Veterans Affairs, Veterans Health Administration, Office of Quality and Performance and VHA Health Assessment Project, Center for Health Quality, Outcomes and Economic Research; May 2000. 13. Office of Management and Budget. Standards for maintaining, collecting and presenting federal data on race and ethnicity. Available at: http:/www.whitehouse.gov/omb/inforeg/r&e_app-a-update.pdf. Accessed March 18, 2002. 14. National Center for Chronic Disease Prevention and Health Promotion. Behavioral Risk Factor Surveillance System Survey Data. Hyattsville, Md: National Center for Chronic Disease Prevention, Centers for Disease Control and Prevention and Health Promotion, US Dept of Health and Human Services; 1999. 15. Ware J, Kosinski M, Bayliss MS. Comparison of methods for the scoring and the statistical analysis of the SF-36 health profile and summary measures: summary of results from the Medical Outcomes Study. Med Care. 1995;33:AS264AS279.[Web of Science][Medline] 16. Wolinsky FD, Coe RM, Mosely RM 2nd, Homan SM. Veterans and non-veterans use of health services: a comparative analysis. Med Care. 1985;23:13581371.[Web of Science][Medline] This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||