We have summarized the evolution of the Nurses’ Health Study (NHS), a prospective cohort study of 121 700 married registered nurses launched in 1976; NHS II, which began in 1989 and enrolled 116 430 nurses; and NHS3, which began in 2010 and has ongoing enrollment.
Over 40 years, these studies have generated long-term, multidimensional data, including lifestyle- and health-related information across the life course and an extensive repository of various biological specimens. We have described the questionnaire data collection, disease follow-up methods, biorepository resources, and data management and statistical procedures.
Through integrative analyses, these studies have sustained a high level of scientific productivity and substantially influenced public health recommendations. We have highlighted recent interdisciplinary research projects and discussed future directions for collaboration and innovation.
EDITOR’S NOTE: Because of space restrictions and the large volume of references relevant to the Nurses’ Health Study, additional references are provided in a supplement to the online version of this article at http://www.ajph.org.
Oral contraceptives were first marketed both in the United Kingdom and the United States in the early 1960s. By 1966, the Medical Research Council of Great Britain Statistical Research Unit, headed by Sir Richard Doll, was receiving reports of healthy young women who were suffering from thrombophlebitis and pulmonary emboli that were presumed to be associated with taking oral contraceptives. Frank Speizer and Martin Vessey, who were working in the unit, noted these reports and raised concerns that millions of healthy women were likely to be exposed to these drugs and that there were no plans to follow the potential long-term consequences of exogenous hormone use. To undertake proper investigations of this topic, it was clear that long-term follow-up of a large cohort of women would be required.
On the basis of the success of the British Doctors’ Study of Smoking,1 pilot studies were begun both in the United Kingdom and the United States, where doctors were asked to pass along to their spouses a mailed questionnaire. As it turned out, there were too few women married to British physicians in the appropriate age group who would have had an opportunity to use oral contraceptives to make such a study in the United Kingdom feasible.
With promising results from the US pilot studies, Speizer, who was at Harvard Medical School at the time, submitted a National Cancer Institute grant application that was eventually successful. However, by the time the funding was received in 1974, the follow-up questionnaire was revised to cover a 2-year period and needed to be repiloted. In the second pilot mailing, questionnaires were addressed to 3 groups on the basis of responses from the first pilot: “Doctor, give this to your wife,” “Ms,” and “Mrs.” The worst response came from the “Ms” group. Women who were married and aged 30 to 55 years in 1974 did not like to be called “Ms.” The most important information came from the “Mrs” group, in that a substantial portion of the women in this group reported that they had not seen the initial questionnaire. Their physician husbands had presumably filled out the form for them. We therefore redirected the effort to nurses, because we believed it was important to have a medically sophisticated group of study participants.
In 1976, Speizer et al. mailed the baseline questionnaire to eligible married female registered nurses aged 30 to 55 years who resided in 1 of 11 US states with the largest number of registrants (New York, California, Pennsylvania, Ohio, Massachusetts, New Jersey, Michigan, Texas, Florida, Connecticut, and Maryland). A total of 121 700 nurses returned the questionnaire (71.2% of those invited), and the cohort reflected the racial composition of nurses at that time (97% White). Although the participants had a slightly higher socioeconomic status than did the general population and were mostly White, which may have initially affected generalizability, the population selection enhances internal validity because the health knowledge and commitment to research of the nurses contributes to high-quality and complete self-reported health data as well as high follow-up rates.
Women continue to be followed via biennial questionnaires, making the NHS the first cohort of its size with repeated data collection. The range of lifestyle and health outcome data collected has significantly expanded over time, beginning with the inclusion in 1980 of questions about physical activity and a semiquantitative food frequency questionnaire (FFQ),2 which was piloted and completed by the cohort under the direction of Walter Willett. The expanded collection of data was facilitated by the use of optically scanned questionnaires starting in 1982. After conducting a detailed validation study, the FFQ was significantly expanded in 1984 and 1986. Since that time it has been repeated every 4 years. Additional behavioral and lifestyle risk factors have been repeatedly assessed over the years (the box on this page), an extensive biorepository has been established (Figure 1; Tables 1 and 2), and the array of chronic diseases and other outcomes has expanded steadily over time (Table 3).
Employment status, shift work | Smoke exposure |
Family history of cancer or heart or other disease | Living arrangement |
Reproductive history and menopause | Neighborhood characteristics |
Prescription and over-the-counter medications | Environmental exposures |
Cancer and other screening tests | Mental health |
Leisure time physical activity, sedentary time | Social networks |
Sleep patterns | Optimism scale |
Alcohol use | Caregiving and caregiving stress |
Weight, height, waist, and hip measurements | Quality of life |
Diet (including during adolescence) | Activities of daily living |
Study | Year | Sample Type | Description | No. Samples |
NHS | ||||
Main toenail cohort | 1982–1984 | Toenails | Nail clippings from all 10 toes | 68 213 |
Main blood cohort | 1989–1990 | Blood | 32 826 | |
Main blood cohort, 2nd collection | 2000–2002 | Blood, urine | Only women who provided a 1st collection blood sample; first morning spot urine | 18 743 |
Reproducibility study | 1989–1990; 1991; 1992 | Blood | Repeat blood samples from the same participants | 227 |
Nested folate trial | 1996; 1999 | Blood | Repeat blood samples from the same participants | 685 |
Main cheek cohort | 2002–2004 | Buccal cells | Among women without blood samples | 33 040 |
Renal function cohort | 2003; | Blood, urine | Repeat blood and urine samples from the same participants | 1 992 |
2007–2008 | ||||
Cognitive function cohort | 2007 | Blood | 130 | |
Diet and lifestyle validation study | 2009– | Blood, urine, saliva | 4 24-h urine specimens; 2 blood specimens | 375 |
NHS II | ||||
Main blood cohort | 1996–1999 | Blood | Among premenopausal women, samples were collected during the follicular and midluteal phases of the menstrual cycle | 29 611 |
Reproducibility study | 1996–1999; 2000; 2001 | Blood, urine | Repeat blood and urine samples from the same participant | 297 |
Renal function cohort | 2003; 2008 | Blood, urine | Repeat blood and urine samples from the same participants | 1 847 |
Main cheek cohort | 2004–2006 | Buccal cells | Among women without blood samples | 29 392 |
Melatonin study | 2009 | 24-h urine | 180 | |
Diet and lifestyle validation study | 2009– | Blood, urine, saliva | Four 24-h urine specimens; 2 blood specimens | 375 |
Main blood cohort, 2nd collection | 2008–2011 | Blood, urine | Only women who provided a 1st collection blood sample; first morning spot urine | 17 275 |
Diabetes and Women’s Health | 2012 | Blood, urine, toenails | Only women with a history of gestational diabetes mellitus | 2 089 |
Mind Body Study | 2013–2014 | Blood, urine, timed saliva, stool, hair, toenails | Among the lifestyle validation study participants | 250 |
Note. NHS = Nurses’ Health Study. Blood includes plasma, white blood cell count, and red blood cell count.
NHS | NHS II | |||
Disease | Follow-Up Years | Cases Collected, No. | Follow-Up Years | Cases Collected, No. |
Breast cancer | 1976–2010 | 7 459 | 1991–2011 | 2 705 |
Colorectal cancer | 1980–2010 | 1 048 | 1991–2011 | 76 |
Ovarian cancer | 1976–2010 | 486 | 1989–2011 | 131 |
Melanoma | 1976–2004 | 343 | 1991–2009 | 140 |
Pancreatic cancer | 1980–2000 | 173 | 0 | |
Renal cancer | 1976–2004 | 163 | 0 | |
Brain cancer | 1976–2000 | 88 | 0 | |
Bladder cancer | 1980–2000 | 40 | 0 | |
Colorectal adenoma | 1980–2008 | 485 | 0 | |
Non-Hodgkin’s lymphoma | 1978–2010 | 385 | 1989–2011 | 84 |
Benign breast disease | 1976–1998 | 3 826 | 1989–2011 | 3 289 |
Barrett’s esophagus | 2000–2004 | 329 | 0 | |
Endometrial | 1978–2012 | 475 | 0 | |
Meningioma | 1992–2010 | 24 | 1993–2009 | 40 |
Note. NHS = Nurses’ Health Study. In NHS II, cancers such as pancreatic cancer and renal cancer are planned for the near future.
List of Selected Outcomes Studied in the Nurses’ Health Study and Estimates of Total Available Cases
Total Events, No. | |||
Outcome | NHS | NHS II | Reference (See Online Appendix) |
General | |||
Total mortality | 22 800 | 2 600 | Kawachi et al. (1993) |
Weight | All participants | All participants | Fine et al. (1999) |
Incident diabetes | 17 500 | 9 000 | Hu et al. (2001) |
Incident primary hyperparathyroidism | 350 | NC | Vaidya et al. (2015) |
Incident kidney stones | 1 500 | 1 800 | Ferraro et al. (2014) |
Incident depression (diagnosis and treatment) | 11 500 | 12 500 | Chang et al. (2016) |
Cancer outcomes | |||
Incident breast cancer | 13 000 | 5 300 | Hankinson et al. (1998) |
Incident colorectal cancer | 3 000 | 400 | Giovannucci et al. (1995) |
Incident endometrial (uterine) cancer | 2 000 | 700 | De Vivo et al. (2002) |
Incident pancreatic cancer | 800 | 50 | Michaud et al. (2001) |
Incident lung cancer | 3 600 | 250 | Feskanich et al. (2000) |
Incident squamous cell skin cancer | 2 400 | 500 | Siiskonen et al. (2016) |
Incident basal cell skin cancer | 23 700 | 8 900 | Wu et al. (2015) |
Incident ovarian cancer | 1 300 | 290 | Hankinson et al. (1995) |
Incident bladder cancer | 800 | 70 | McGrath et al. (2007) |
Incident kidney and ureter cancer | 500 | 200 | Cho et al. (2013) |
Incident esophageal cancer | 170 | 20 | Song et al. (2016) |
Incident brain cancer | 300 | 30 | Holick et al. (2007) |
Incident Hodgkin’s lymphoma | 100 | 50 | Abel et al. (2010) |
Incident non-Hodgkin’s lymphoma | 1 400 | 400 | Zhang et al. (2000) |
Incident melanoma | 1 200 | 500 | Wu et al. (2015) |
Incident multiple myeloma | 300 | 40 | Birmann et al. (2007) |
Incident leukemia | 250 | 80 | Schernhammer et al. (2012) |
Cardiovascular | |||
Incident myocardial infarction | 7 000 | 700 | Mukamal et al. (2005) |
Incident hypertension | 76 000 | 25 300 | Forman et al. (2008) |
Incident stroke | 2 500 | 700 | Rexrode et al. (1997) |
Incident sudden cardiac death | 500 | NC | Chiuve et al. (2011) |
Incident peripheral artery disease | 140 | NC | Bertoia et al. (2013) |
Eyes and eyesight | |||
Incident glaucoma | 1 000 | NC | Kang et al. (2010) |
Incident macular degeneration | 1 800 | NC | Cho et al. (2000) |
Incident cataracts | 4 200 | NC | Chasen-Taber et al. (1999) |
Gastrointestinal | |||
Incident polyps | Platz et al. (2000) | ||
Adenoma only | 5 200 | 3 800 | |
Hyperplastic only | 3 100 | 3 300 | |
Adenoma and hyperplastic | 1 300 | 900 | |
Incident gastrointestinal bleeding | 1 700 | 800 | Huang et al. (2011) |
Incident ulcerative colitis | 340 (NHS and NHS II combined) | Ananthakrishnan et al. (2012) | |
Incident Crohn’s disease | 270 (NHS and NHS II combined) | Ananthakrishnan et al. (2012) | |
Gynecologic | |||
Gestational diabetes | NC | 6 000 | Zhang et al. (2014) |
Incident infertility | NC | 3 200 | Chavarro et al. (2007) |
Incident pregnancy loss (spontaneous abortion and stillbirth) | NC | 4 500 | Gaskins et al. (2014) |
Incident endometriosis | NC | 5 500 | Shah et al. (2013) |
Incident uterine leiomyomata | NC | 9 800 | Terry et al. (2010) |
Incident benign breast disease (centrally reviewed) | NC | 2 000 | Liu et al. (2013) |
Premenstrual syndrome | NC | 1 300 | Bertone-Johnson et al. (2015) |
Neurologic | |||
Cognitive function | 19 400 | NC | Weuve et al. (2004) |
Incident amyotrophic lateral sclerosis | 150 | NC | Wang et al. (2011) |
Incident multiple sclerosis | 210 | 350 | Munger et al. (2003) |
Incident Parkinson’s disease | 700 | 80 | Simon et al. (2007) |
Incident seizures or epilepsy | NC | 250 | Dworetzky et al. (2010) |
Incident hearing loss | 19 000 | 12 000 | Curhan et al. (2015) |
Restless leg syndrome | 900 | 4 200 | Li et al. (2012) |
Pulmonary | |||
Incident chronic obstructive pulmonary disease | 5 700 | 2 500 | Varraso et al. (2007) |
Incident asthma | 15 500 | 21 100 | Camargo et al. (1999) |
Incident pulmonary embolism | 1 600 | 700 | Kabrhel et al. (2011) |
Incident pneumonia | NC | 1 200 | Neuman et al. (2010) |
Autoimmune and musculoskeletal disorders | |||
Incident gout | 1 000 | 370 | Hak et al. (2010) |
Incident psoriasis | 1 600 | 1 600 | Wu et al. (2014) |
Incident systemic lupus erythematosus | 200 | 130 | Costenbader et al. (2007) |
Incident rheumatoid arthritis | 1 000 | 500 | Sparks et al. (2016) |
Incident hip fractures | 3 400 | 700 | Meyer et al. (2016) |
Note. NC = not collected; NHS = Nurses’ Health Study. The table shows approximate estimates. The number of cases used in a specific study depends on the disease definition used, start of follow-up used, and the number of exclusions (e.g., missing the exposure). Outcome assessment start time and methods are not always consistent across endpoints and between cohorts.
In 1976, at the beginning of the NHS, the women who were aged 30 to 55 years would have had few opportunities to have used oral contraceptives for prolonged periods before their first pregnancy. To study the health effects of oral contraceptive use and other risk factors during early reproductive life, Willett et al. enrolled 116 430 nurses aged 25 to 42 years in 1989 and residing in 1 of 14 states (California, Connecticut, Indiana, Iowa, Kentucky, Massachusetts, Michigan, Missouri, New York, North Carolina, Ohio, Pennsylvania, South Carolina, and Texas), creating the Nurses’ Health Study II (NHS II). The baseline questionnaire included a color booklet of all brands and types of oral contraceptives ever sold in the United States and questions about lifetime oral contraceptives use, including details on duration and type, information that was not obtained in the NHS. Other new information was obtained on exposures in adolescence and early adult life, including physical activity, alcohol consumption, body fat profile, and diet.
Because of the development of computers and software capabilities, and to reduce costs, participants were given the option to respond to Web-based questionnaires beginning in 2001, and by 2011 70% of active NHS II participants had switched to online questionnaires. NHS II also established a large biorepository beginning with blood samples collected in 1996, including samples collected specifically during the luteal and follicular phases of the menstrual cycle. To investigate factors that influence weight change, 27 805 children (aged 9–14 years) of NHS II nurses were enrolled in their own follow-up study, the Growing Up Today Study. This study had 2 enrollment waves, in 1996 and 2004, and was initially under the leadership of Graham Colditz (now led by Stacey Missmer).
The Nurses’ Health Study 3 (NHS3), currently recruiting, began in 2010 (led by Jorge Chavarro) and has enrolled more than 40 000 female nurses aged 19 to 49 years residing throughout the United States and Canada, with 14% self-identifying as members of a racial or ethnic minority. Beginning in 2015, recruitment was extended to male nurses. NHS3 participants complete Web-based questionnaires every 6 months, collecting data on current exposures as well as exposures during adolescence, and in a substudy, participants report extensive information on exposures before, during, and immediately after pregnancy.
Active follow-up rates as of the writing of this article were 72% for the 6-month follow-up questionnaire, 82% for the 12-month follow-up questionnaire, between 90% and 94% for women who completed more than a year in the study, and 93% for women electing to answer pregnancy-specific questionnaires. Follow-up rates do not differ significantly by race/ethnicity and other demographic characteristics. NHS3 has taken advantage of new technologies (Web-based questionnaires and mobile devices), and the subsequent personalization of study timelines (e.g., reporting pregnancies in “real time”) has increased the focus on prospective data collection during key life course periods.
With continuous data collection since 1976, the NHSs have generated layers of resources, from behavioral data across the life course to various biological specimens. To accommodate the wealth of high-dimensional data and maximize cost efficiency, we have developed an effective infrastructure that supports questionnaire development and processing, cohort follow-up, an extensive biorepository, and data management and statistical analysis.
Our biennial questionnaires have produced a unique resource of lifestyle- and health-related data collected continuously over the past 40 years (see the box on p1574). The implementation of optically scanned questionnaires in 1982 allowed faster, more efficient, and reliable data entry, making room for growth in the scientific scope of the study with longer questionnaires and lower processing time and cost. In addition to questions regarding diet and lifestyle behaviors, we have regularly assessed anxiety, depression, optimism, and social networks on the Medical Outcomes Short Form-36 (SF-36), a validated3 scale of functional health and well-being.
We have also developed measures of environmental exposures (e.g., air pollution and ultraviolet radiation) and neighborhood characteristics (e.g., socioeconomic status, walkability, access to green space) on the basis of residential histories. NHS II participants additionally reported body size at aged 5, 10, and 20 years in 1989 and physical activity at aged 12 to 17 years in 1997; a subset of 41% of the women completed a supplemental FFQ asking about high school diet in 1998. In 2001/2002, 39 904 mothers of NHS and NHS II (90% NHS II) participants provided detailed information about their prepregnancy, pregnancy, and early life experiences with our cohort members.4 These unique data give us the opportunity to study relations of disease risk to a comprehensive set of social, lifestyle, and environmental exposures at multiple important time points across the life course.
Beginning in 1980 the NHS questionnaires included a validated FFQ every 2 to 4 years.5 This FFQ was expanded from 61 to 116 items in 1984 and 131 items in 1991. The FFQ was further modified in NHS3 to include more detailed questions about cooking methods and is administered during key windows of exposure (e.g., pregnancy). A team of research dietitians works to maintain a complete and accurate database of the nutrient content of foods included on the FFQ: updating the nutrient components of foods that change over time (i.e., trans fat) and researching the nutrient content of new or reformulated products. Our data on the fatty acid composition of foods are unique because we have directly analyzed representative samples of foods every 4 years to account for changes in manufacturing. These data are used in combination with national consumption patterns that are analyzed each year that the FFQ is administered. As described in an accompanying article, the validity of these dietary questionnaires has been assessed extensively.5
Similarly, the cohort questionnaires include repeated measures of physical activity using a modified Paffenbarger questionnaire. Participants report the average time per week over the last year spent doing a list of specific activities (e.g., jogging), selected to represent the most important contributors to total activity. The physical activity questionnaire was initially validated among 231 NHS II women who completed both 1-week activity recalls and 7-day activity diaries as comparison methods.6 For total physical activity, the correlation with the recalls was 0.79 and with the records diaries it was 0.62. We are currently conducting additional validation studies that include participants in the 3 cohorts to further evaluate the validity of these methods.5
Our research of various mailing strategies determined that certified mail was the most effective approach for reaching study participants who did not respond to an initial mailing; therefore, participants who do not respond to 3 regular mailings were resent the questionnaire via certified mail.7 Because of increasing costs, certified mail has been replaced with hand-addressed mailings for the majority of participants over the past 7 years, with a response rate of approximately 30% for the first hand-addressed mailing and approximately 15% for the second and final mailing after 6 to 7 regular mailings without a response. With all mailings, the overall response rate for participants as of 2012 is 86.2%. This high retention rate stems from many factors, including but not limited to the selection of motivated health professionals, frequent contact with participants, repeated mailings followed by hand-addressed letters to nonresponders, the use of a shorter questionnaire in the final mailings to nonresponders, and an annual participant newsletter.
Participants report newly diagnosed diseases biennially on their follow-up questionnaires. For any new report of disease (Table 3 presents a list of selected outcomes), we ask permission to review medical records and collect pathology specimens. Once permission is obtained, our software generates repeated mailings to hospitals to obtain medical records.
In virtually every instance in which we have signed permission, we have obtained documentation of cancer diagnoses. Physicians blinded to questionnaire exposure data review all medical records to confirm self-reported diseases. Participants additionally report diseases that are not confirmed but have been shown to be validly self-reported among these health professionals, such as hypertension,8 high cholesterol,8 and weight.9 Furthermore, we systematically search the National Death Index and state tumor registries. Searching the National Death Index is a highly effective method for monitoring deaths,10 and it now provides all listed causes of death, reducing the need to obtain death certificates.
NHS and NHS II have a rich resource of biospecimens, including blood, urine, toenails, buccal cells, stool, and saliva (Figure 1; Tables 1 and 2), and the collection of repeated samples substantially increases statistical power for analyses of age and latency effects. The different biospecimens provide an extensive range of research opportunities; for example, we have examined plasma and urinary sex hormones, dietary biomarkers, inflammatory markers, heavy metals in toenails, melatonin in urine, and time-integrated fatty acid status in red blood cells. Additionally, the white blood cells provide a source of DNA for genome-wide association studies, sequencing, copy number variation, telomere length, and epigenetic analyses.
We test the assay performance in our mailed samples (fit-for-purpose) before use in participant samples: first, we assess split sample reproducibility using the interassay coefficient of variation. Second, because blood and urine were mailed to our central laboratory and processed the day after collection, we compare biomarker values in samples processed immediately versus 24 or 48 hours after collection11,12 using samples collected specifically for this purpose. Finally, we examine within-person stability over 1 to 3 years to confirm whether analytes measured in a single sample can be used to represent long-term status and thus are useful to study disease incidence prospectively.
NHS has one of the largest tumor tissue repositories nested within a prospective epidemiological study (Table 2). Our recovery rate ranges from 70% to 80% for formalin-fixed paraffin-embedded blocks and hematoxylin and eosin–stained slides. We use remnant tissue blocks in excess of what is required for standard of care from local hospitals, covering a range of lesion types and years of diagnosis to conduct pilot studies of assay precision. In collaboration with the Dana-Farber and Harvard Cancer Center Core, we have constructed hundreds of tissue microarrays (TMAs), an efficient approach to measuring biomarkers using hundreds of samples on the same block.13 To date, we have constructed TMAs to study various biomarkers, including tumor cell protein and immune cell expression, including 44 TMAs for 5936 breast cancers, 10 TMAs for 1123 colorectal tumors, 3 TMAs for 250 ovarian cancers, and 7 TMAs from 958 benign breast disease and normal breast tissue specimens (an appendix containing related NHS and NHS II publications is available as a supplement to the online version of this article at http://www.ajph.org).
Our tumor tissue resources are unique because the corresponding lifestyle data and blood samples are available. The tumor tissue specimens allow us to evaluate hypotheses relating etiologic factors to specific somatic molecular characteristics and to evaluate interactions between exogenous factors and tumor molecular features in relation to tumor behavior, response to treatment, and survival.14–22 Furthermore, by providing a link between an etiologic factor and a specific tumor molecular subtype, these studies may eventually provide important support for individualized treatment approaches.
NHS has been at the forefront of research in mammographic density, 1 of the strongest risk factors for breast cancer.23–25 We collected mammograms from NHS participants with breast cancer and controls who provided a blood sample in 1989/90, allowing us to use biomarker data in analyses of mammographic density. We obtained prediagnositic mammograms from 1446 participants with breast cancer and 2406 controls.26 We digitize all film mammograms and are collecting prediagnostic mammograms for women who provided a second blood sample in the 2000 collection. This will provide a unique resource allowing us to assess changes in mammographic density (approximately 10 years apart) and its correlation with changes in circulating biomarkers and breast cancer risk. Furthermore, we are using novel imaging technology to characterize a variety of mammogram texture features (i.e., radiomics). Importantly, our measurement of mammographic density is highly reproducible, with a within-person intraclass correlation coefficient of 0.93.27
The Statistics Group and the Channing computer facility are key infrastructure resources for the management of our database as it grows with the continuous receipt of biennial questionnaires, new -omics data, and new additions to the nutrient composition database. The Statistics Group collaborates on complex statistical questions, develops statistical software, maintains and updates our growing databases (e.g., integrating new genome-wide association studies or -omics results), and leads regular seminars and training sessions, including teaching newer methods for integrative analyses of -omics data sets.
We have developed more than 30 custom-designed SAS macros for analyzing NHS data (http://www.hsph.harvard.edu/donna-spiegelman/software). These include basic macros, for example, reading in the data from multiple questionnaires, as well as more sophisticated tools, for example, using the risk set regression calibration method to correct for bias stemming from measurement error in baseline or time-varying exposures28 and direct statistical testing of heterogeneity of risk factor associations across disease subtypes using competing risks analysis (prospective studies) or polytomous logistic regression (nested case–control studies).
Most analyses are conducted by individual trainees supervised by study investigators, a key to our scientific productivity. Indeed, the NHS is a major resource for training the next generation in epidemiology, genetics, and nutrition, with more than 108 doctoral students and 100 postdoctoral fellows from Harvard and many other institutions thus far. We provide a user manual for the computer system, extensive documentation of statistical analyses, and sample computer programs. The macros help to ensure that analyses are done consistently and correctly across investigators with the most up-to-date statistical methods. Finally, to ensure high quality, all articles undergo rigorous review: project proposals and initial results are vetted at our biweekly meetings, and articles are reviewed for programming, technical accuracy, and scientific accuracy before they are approved for submission.
In addition to research data (biennial, supplemental, and substudy questionnaires; field study data; medical and death records; specimen data; -omics data; geocoding; genetics), there are extensive organizational data, including participant- and study-tracking information, the Biorepository’s Laboratory Information Management System, documentation of study protocols, and internal and external Web sites. A team of 35 data management experts, programmers, application developers, and information technology specialists oversee daily operations. They develop and maintain a suite of homegrown software applications, Oracle databases, Java applications, the Laboratory Information Management System, survey development apps, and Web sites. Currently, the cohort data are maintained on a private cluster consisting of 200 UNIX and Linux servers, a 400-terabyte multitiered Avere–Isilon storage system, and an EMC NetWorker backup system. The system is backed up daily, with offline and offsite permanent storage.
More than 90% of the funding for this large NHS infrastructure comes from federal grants.
To maximize the value of the NHSs’ resources and broaden the scope of research, we have facilitated data sharing and collaborations with external investigators, expanded the cohort to minorities and male nurses, and integrated new methodologies that provide innovative research dimensions.
All questionnaires, details about biospecimen collections, and guidelines for outside users are available on our public Web site (http://www.channing.harvard.edu/nhs). More than 220 external collaborations with researchers across the globe have begun over the past 10 years. Typically, an outside researcher prepares a brief proposal, and the NHS investigator group reviews the proposal to verify that there is no overlap with ongoing work and to identify a local investigator to assist with the project. This process is identical for local investigators.
Once a request is approved, we provide an introduction to our data and a secure password to access our de-identified data files. In addition, the NHS has contributed to more than 40 consortiums and pooling projects, including genome-wide association studies and other genetic studies (e.g., the Ovarian Cancer Association Consortium), plasma-based pooling studies (e.g., the Vitamin D Pooling Project of Rarer Cancers), and consortiums involving pooled questionnaire data (e.g., the Collaborative Group on Epidemiologic Studies of Female Cancers and the Pooling Project of Diet and Cancer).
The demographics of nursing have changed greatly since 1976, when NHS started, with increasing proportions of minorities and men. One of the key recruitment priorities for NHS3 is to increase minority participation. We conducted a target mailing of invitational postcards in 2012 to minority-dense zip codes and targeted licensed practical nurses and licensed vocational nurses. This doubled the enrollment rate of African American and Hispanic women from the rate of invitations originating from nursing organizations or colleagues already participating in the study.
Ongoing relationships with nursing organizations, such as the National Black Nurses Association, have also been key tools to reach potential participants of color. In 2015 we began recruiting men as primary study participants, and male partners of female NHS3 participants are included in some of the pregnancy substudies. Of note, the NHS cohorts are complemented by the parallel, ongoing Health Professionals Follow-up Study cohort of 50 529 male health professionals, which began in 1986.
With collaborators at the Dartmouth Institute for Health Policy and Clinical Practice, we are currently using the vast Centers for Medicare and Medicaid Services’ resources to obtain information regarding NHS participants’ health and health care utilization. First, claims data will help identify cancer and other diagnoses, especially among women who have been lost to follow-up. We will also obtain Medicare prescription data (Part 4) for research in disease etiology and survival. These data will form the basis for a new line of research into health care costs and utilization.
With repeated exposure information both pre- and postdiagnosis, NHS is uniquely positioned to evaluate when a variable is important to survival during the disease process and offer key findings with actionable clinical implications. For example, postdiagnostic moderate physical activity (equivalent to walking 3 hours/week) was associated with 50% lower breast cancer mortality,29 postdiagnostic aspirin was associated with 50% lower mortality from breast cancer,30 and among colorectal cancer patients with COX-2 positive tumors, postdiagnostic aspirin was associated with 61% lower mortality from colorectal cancer.31 Future studies will evaluate novel tissue markers, such as our recent finding that the androgen receptor is an independent predictor of breast cancer survival.17 The continued administration of the SF-36 and several mental health scales will permit substantial research on functional health and well-being after cancer, with the unique ability to control for prediagnostic functioning.
NHS has applied metabolomics to the biology of various diseases, potentially uncovering novel pathways in etiology and new targets for intervention. In our pilot studies of the Massachusetts Institute of Technology and Harvard Broad Institute metabolomics platforms,32 the majority of metabolites performed well (coefficient of variation < 20% for 92% of metabolites). Intraclass correlation coefficients comparing samples processed immediately versus after 24 hours were ≥ 0.75 for approximately 75% of metabolites, indicating our collection and processing methods will not interfere with findings for the majority of available metabolites. In samples collected from participants 2 years apart, reasonable long-term reproducibility (intraclass correlation coefficient ≥ 0.4) was observed for 90% of metabolites. Thus, most measured metabolites are excellent candidates for epidemiological studies. Recently, we found that elevated levels of branched-chain amino acids were associated with a more than doubled increased risk of pancreatic cancer, with stronger associations for participants diagnosed within 5 years of blood draw, suggesting that this may be an early marker of pancreatic cancer.33
The NHS has sustained remarkable scientific productivity in the past 40 years, reflected in more than 1200 publications that have substantially influenced prevention recommendations by such organizations as the American Cancer Society, the American Heart Association, and the World Health Organization. Robust findings from the NHS contributed to the evidence base for developing US Dietary Guidelines to reduce intakes of trans fat, saturated fat, sugar-sweetened beverages, red and processed meats, and refined carbohydrates while promoting the higher intake of healthy types of fats (e.g., unsaturated fats from vegetable oils, nuts and seeds, and seafood) and carbohydrates (e.g., whole grains, fruits, and vegetables).5 NHS studies on the benefits of physical activity contributed to the evidence base for the Physical Activity Guidelines for Americans.34 NHS studies of survivorship after cancer diagnosis also contributed to the Nutrition and Physical Activity Guidelines for Cancer Survivors.34
A fundamental reason for the impact of the NHSs is that they have been embedded in an academic environment with great strengths in many disciplines, particularly epidemiology, medicine, and biostatistics, and have been able to incorporate the energy and creativity of students, fellows, and faculty from across the whole of our biomedical community. Other major lessons learned include the value of identifying a loyal and interested cohort base and maintaining good contact with the participants, the value of repeated assessments over time, the commitment to training that provides interesting and innovative opportunities for young people to keep the research fresh, and the value of avoiding complacency and being alert to new opportunities and new methodologies to use the resources for maximum scientific and public health gain.
The combination of repeated measures of traditional risk factors along with diet, physical activity, lifestyle, and biochemical and genetic data in the NHS cohorts, as well as state-of-the-art mobile high-resolution measures, provides a resource for powerful etiologic and translational research. This type of big data within the context of a well-characterized cohort study is expected to uncover disease mechanisms and provide critical insight for public health prevention programs and personalized medicine.
ACKNOWLEDGMENTS
The Nurses’ Health Study (NHS) is supported by the National Institutes of Health (NIH; grants UM1 CA186107, P01 CA87969, R01 CA49449, R01 HL034594, and R01 HL088521). NHS II is supported by NIH grants UM1 CA176726 and R01 CA67262. Y. B. is supported by an NIH grant (P30 DK046200) and a KL2/Catalyst Medical Research Investigator Training award (an appointed KL2 award) from Harvard Catalyst/The Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health award KL2 TR001100).
The authors would like to thank the participants and staff of the Nurses’ Health Study for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY.
HUMAN PARTICIPANT PROTECTION
The Nurses’ Health Studies were approved by the Human Research Committee at the Brigham and Women’s Hospital, Boston, MA, and participants provided written informed consent.