We have summarized the evolution of the Nurses’ Health Study (NHS), a prospective cohort study of 121 700 married registered nurses launched in 1976; NHS II, which began in 1989 and enrolled 116 430 nurses; and NHS3, which began in 2010 and has ongoing enrollment.

Over 40 years, these studies have generated long-term, multidimensional data, including lifestyle- and health-related information across the life course and an extensive repository of various biological specimens. We have described the questionnaire data collection, disease follow-up methods, biorepository resources, and data management and statistical procedures.

Through integrative analyses, these studies have sustained a high level of scientific productivity and substantially influenced public health recommendations. We have highlighted recent interdisciplinary research projects and discussed future directions for collaboration and innovation.

EDITOR’S NOTE: Because of space restrictions and the large volume of references relevant to the Nurses’ Health Study, additional references are provided in a supplement to the online version of this article at http://www.ajph.org.

Oral contraceptives were first marketed both in the United Kingdom and the United States in the early 1960s. By 1966, the Medical Research Council of Great Britain Statistical Research Unit, headed by Sir Richard Doll, was receiving reports of healthy young women who were suffering from thrombophlebitis and pulmonary emboli that were presumed to be associated with taking oral contraceptives. Frank Speizer and Martin Vessey, who were working in the unit, noted these reports and raised concerns that millions of healthy women were likely to be exposed to these drugs and that there were no plans to follow the potential long-term consequences of exogenous hormone use. To undertake proper investigations of this topic, it was clear that long-term follow-up of a large cohort of women would be required.

On the basis of the success of the British Doctors’ Study of Smoking,1 pilot studies were begun both in the United Kingdom and the United States, where doctors were asked to pass along to their spouses a mailed questionnaire. As it turned out, there were too few women married to British physicians in the appropriate age group who would have had an opportunity to use oral contraceptives to make such a study in the United Kingdom feasible.

With promising results from the US pilot studies, Speizer, who was at Harvard Medical School at the time, submitted a National Cancer Institute grant application that was eventually successful. However, by the time the funding was received in 1974, the follow-up questionnaire was revised to cover a 2-year period and needed to be repiloted. In the second pilot mailing, questionnaires were addressed to 3 groups on the basis of responses from the first pilot: “Doctor, give this to your wife,” “Ms,” and “Mrs.” The worst response came from the “Ms” group. Women who were married and aged 30 to 55 years in 1974 did not like to be called “Ms.” The most important information came from the “Mrs” group, in that a substantial portion of the women in this group reported that they had not seen the initial questionnaire. Their physician husbands had presumably filled out the form for them. We therefore redirected the effort to nurses, because we believed it was important to have a medically sophisticated group of study participants.

In 1976, Speizer et al. mailed the baseline questionnaire to eligible married female registered nurses aged 30 to 55 years who resided in 1 of 11 US states with the largest number of registrants (New York, California, Pennsylvania, Ohio, Massachusetts, New Jersey, Michigan, Texas, Florida, Connecticut, and Maryland). A total of 121 700 nurses returned the questionnaire (71.2% of those invited), and the cohort reflected the racial composition of nurses at that time (97% White). Although the participants had a slightly higher socioeconomic status than did the general population and were mostly White, which may have initially affected generalizability, the population selection enhances internal validity because the health knowledge and commitment to research of the nurses contributes to high-quality and complete self-reported health data as well as high follow-up rates.

Women continue to be followed via biennial questionnaires, making the NHS the first cohort of its size with repeated data collection. The range of lifestyle and health outcome data collected has significantly expanded over time, beginning with the inclusion in 1980 of questions about physical activity and a semiquantitative food frequency questionnaire (FFQ),2 which was piloted and completed by the cohort under the direction of Walter Willett. The expanded collection of data was facilitated by the use of optically scanned questionnaires starting in 1982. After conducting a detailed validation study, the FFQ was significantly expanded in 1984 and 1986. Since that time it has been repeated every 4 years. Additional behavioral and lifestyle risk factors have been repeatedly assessed over the years (the box on this page), an extensive biorepository has been established (Figure 1; Tables 1 and 2), and the array of chronic diseases and other outcomes has expanded steadily over time (Table 3).

Categories of Variables Included on Biennial Study Questionnaires: Nurses’ Health Studies
Table
Table
Employment status, shift workSmoke exposure
Family history of cancer or heart or other diseaseLiving arrangement
Reproductive history and menopauseNeighborhood characteristics
Prescription and over-the-counter medicationsEnvironmental exposures
Cancer and other screening testsMental health
Leisure time physical activity, sedentary timeSocial networks
Sleep patternsOptimism scale
Alcohol useCaregiving and caregiving stress
Weight, height, waist, and hip measurementsQuality of life
Diet (including during adolescence)Activities of daily living
Table

TABLE 1— Biorepository Specimens: NHS and NHS II

TABLE 1— Biorepository Specimens: NHS and NHS II

StudyYearSample TypeDescriptionNo. Samples
NHS
 Main toenail cohort1982–1984ToenailsNail clippings from all 10 toes68 213
 Main blood cohort1989–1990Blood32 826
 Main blood cohort, 2nd collection2000–2002Blood, urineOnly women who provided a 1st collection blood sample; first morning spot urine18 743
 Reproducibility study1989–1990; 1991; 1992BloodRepeat blood samples from the same participants227
 Nested folate trial1996; 1999BloodRepeat blood samples from the same participants685
 Main cheek cohort2002–2004Buccal cellsAmong women without blood samples33 040
 Renal function cohort2003;Blood, urineRepeat blood and urine samples from the same participants1 992
2007–2008
 Cognitive function cohort2007Blood130
 Diet and lifestyle validation study2009–Blood, urine, saliva4 24-h urine specimens; 2 blood specimens375
NHS II
 Main blood cohort1996–1999BloodAmong premenopausal women, samples were collected during the follicular and midluteal phases of the menstrual cycle29 611
 Reproducibility study1996–1999; 2000; 2001Blood, urineRepeat blood and urine samples from the same participant297
 Renal function cohort2003; 2008Blood, urineRepeat blood and urine samples from the same participants1 847
 Main cheek cohort2004–2006Buccal cellsAmong women without blood samples29 392
 Melatonin study200924-h urine180
 Diet and lifestyle validation study2009–Blood, urine, salivaFour 24-h urine specimens; 2 blood specimens375
 Main blood cohort, 2nd collection2008–2011Blood, urineOnly women who provided a 1st collection blood sample; first morning spot urine17 275
 Diabetes and Women’s Health2012Blood, urine, toenailsOnly women with a history of gestational diabetes mellitus2 089
 Mind Body Study2013–2014Blood, urine, timed saliva, stool, hair, toenailsAmong the lifestyle validation study participants250

Note. NHS = Nurses’ Health Study. Blood includes plasma, white blood cell count, and red blood cell count.

Table

TABLE 2— Tissue Collections (Blocks or Hematoxylin and Eosin Stained Slides): NHS and NHS II

TABLE 2— Tissue Collections (Blocks or Hematoxylin and Eosin Stained Slides): NHS and NHS II

NHS
NHS II
DiseaseFollow-Up YearsCases Collected, No.Follow-Up YearsCases Collected, No.
Breast cancer1976–20107 4591991–20112 705
Colorectal cancer1980–20101 0481991–201176
Ovarian cancer1976–20104861989–2011131
Melanoma1976–20043431991–2009140
Pancreatic cancer1980–20001730
Renal cancer1976–20041630
Brain cancer1976–2000880
Bladder cancer1980–2000400
Colorectal adenoma1980–20084850
Non-Hodgkin’s lymphoma1978–20103851989–201184
Benign breast disease1976–19983 8261989–20113 289
Barrett’s esophagus2000–20043290
Endometrial1978–20124750
Meningioma1992–2010241993–200940

Note. NHS = Nurses’ Health Study. In NHS II, cancers such as pancreatic cancer and renal cancer are planned for the near future.

Table

TABLE 3— List of Selected Outcomes Studied in the Nurses’ Health Study and Estimates of Total Available Cases

TABLE 3— List of Selected Outcomes Studied in the Nurses’ Health Study and Estimates of Total Available Cases

Total Events, No.
OutcomeNHSNHS IIReference (See Online Appendix)
General
 Total mortality22 8002 600Kawachi et al. (1993)
 WeightAll participantsAll participantsFine et al. (1999)
 Incident diabetes17 5009 000Hu et al. (2001)
 Incident primary hyperparathyroidism350NCVaidya et al. (2015)
 Incident kidney stones1 5001 800Ferraro et al. (2014)
 Incident depression (diagnosis and treatment)11 50012 500Chang et al. (2016)
Cancer outcomes
 Incident breast cancer13 0005 300Hankinson et al. (1998)
 Incident colorectal cancer3 000400Giovannucci et al. (1995)
 Incident endometrial (uterine) cancer2 000700De Vivo et al. (2002)
 Incident pancreatic cancer80050Michaud et al. (2001)
 Incident lung cancer3 600250Feskanich et al. (2000)
 Incident squamous cell skin cancer2 400500Siiskonen et al. (2016)
 Incident basal cell skin cancer23 7008 900Wu et al. (2015)
 Incident ovarian cancer1 300290Hankinson et al. (1995)
 Incident bladder cancer80070McGrath et al. (2007)
 Incident kidney and ureter cancer500200Cho et al. (2013)
 Incident esophageal cancer17020Song et al. (2016)
 Incident brain cancer30030Holick et al. (2007)
 Incident Hodgkin’s lymphoma10050Abel et al. (2010)
 Incident non-Hodgkin’s lymphoma1 400400Zhang et al. (2000)
 Incident melanoma1 200500Wu et al. (2015)
 Incident multiple myeloma30040Birmann et al. (2007)
 Incident leukemia25080Schernhammer et al. (2012)
Cardiovascular
 Incident myocardial infarction7 000700Mukamal et al. (2005)
 Incident hypertension76 00025 300Forman et al. (2008)
 Incident stroke2 500700Rexrode et al. (1997)
 Incident sudden cardiac death500NCChiuve et al. (2011)
 Incident peripheral artery disease140NCBertoia et al. (2013)
Eyes and eyesight
 Incident glaucoma1 000NCKang et al. (2010)
 Incident macular degeneration1 800NCCho et al. (2000)
 Incident cataracts4 200NCChasen-Taber et al. (1999)
Gastrointestinal
 Incident polypsPlatz et al. (2000)
 Adenoma only5 2003 800
 Hyperplastic only3 1003 300
 Adenoma and hyperplastic1 300900
 Incident gastrointestinal bleeding1 700800Huang et al. (2011)
 Incident ulcerative colitis340 (NHS and NHS II combined)Ananthakrishnan et al. (2012)
 Incident Crohn’s disease270 (NHS and NHS II combined)Ananthakrishnan et al. (2012)
Gynecologic
 Gestational diabetesNC6 000Zhang et al. (2014)
 Incident infertilityNC3 200Chavarro et al. (2007)
 Incident pregnancy loss (spontaneous abortion and stillbirth)NC4 500Gaskins et al. (2014)
 Incident endometriosisNC5 500Shah et al. (2013)
 Incident uterine leiomyomataNC9 800Terry et al. (2010)
 Incident benign breast disease (centrally reviewed)NC2 000Liu et al. (2013)
 Premenstrual syndromeNC1 300Bertone-Johnson et al. (2015)
Neurologic
 Cognitive function19 400NCWeuve et al. (2004)
 Incident amyotrophic lateral sclerosis150NCWang et al. (2011)
 Incident multiple sclerosis210350Munger et al. (2003)
 Incident Parkinson’s disease70080Simon et al. (2007)
 Incident seizures or epilepsyNC250Dworetzky et al. (2010)
 Incident hearing loss19 00012 000Curhan et al. (2015)
 Restless leg syndrome9004 200Li et al. (2012)
Pulmonary
 Incident chronic obstructive pulmonary disease5 7002 500Varraso et al. (2007)
 Incident asthma15 50021 100Camargo et al. (1999)
 Incident pulmonary embolism1 600700Kabrhel et al. (2011)
 Incident pneumoniaNC1 200Neuman et al. (2010)
Autoimmune and musculoskeletal disorders
 Incident gout1 000370Hak et al. (2010)
 Incident psoriasis1 6001 600Wu et al. (2014)
 Incident systemic lupus erythematosus200130Costenbader et al. (2007)
 Incident rheumatoid arthritis1 000500Sparks et al. (2016)
 Incident hip fractures3 400700Meyer et al. (2016)

Note. NC = not collected; NHS = Nurses’ Health Study. The table shows approximate estimates. The number of cases used in a specific study depends on the disease definition used, start of follow-up used, and the number of exclusions (e.g., missing the exposure). Outcome assessment start time and methods are not always consistent across endpoints and between cohorts.

In 1976, at the beginning of the NHS, the women who were aged 30 to 55 years would have had few opportunities to have used oral contraceptives for prolonged periods before their first pregnancy. To study the health effects of oral contraceptive use and other risk factors during early reproductive life, Willett et al. enrolled 116 430 nurses aged 25 to 42 years in 1989 and residing in 1 of 14 states (California, Connecticut, Indiana, Iowa, Kentucky, Massachusetts, Michigan, Missouri, New York, North Carolina, Ohio, Pennsylvania, South Carolina, and Texas), creating the Nurses’ Health Study II (NHS II). The baseline questionnaire included a color booklet of all brands and types of oral contraceptives ever sold in the United States and questions about lifetime oral contraceptives use, including details on duration and type, information that was not obtained in the NHS. Other new information was obtained on exposures in adolescence and early adult life, including physical activity, alcohol consumption, body fat profile, and diet.

Because of the development of computers and software capabilities, and to reduce costs, participants were given the option to respond to Web-based questionnaires beginning in 2001, and by 2011 70% of active NHS II participants had switched to online questionnaires. NHS II also established a large biorepository beginning with blood samples collected in 1996, including samples collected specifically during the luteal and follicular phases of the menstrual cycle. To investigate factors that influence weight change, 27 805 children (aged 9–14 years) of NHS II nurses were enrolled in their own follow-up study, the Growing Up Today Study. This study had 2 enrollment waves, in 1996 and 2004, and was initially under the leadership of Graham Colditz (now led by Stacey Missmer).

The Nurses’ Health Study 3 (NHS3), currently recruiting, began in 2010 (led by Jorge Chavarro) and has enrolled more than 40 000 female nurses aged 19 to 49 years residing throughout the United States and Canada, with 14% self-identifying as members of a racial or ethnic minority. Beginning in 2015, recruitment was extended to male nurses. NHS3 participants complete Web-based questionnaires every 6 months, collecting data on current exposures as well as exposures during adolescence, and in a substudy, participants report extensive information on exposures before, during, and immediately after pregnancy.

Active follow-up rates as of the writing of this article were 72% for the 6-month follow-up questionnaire, 82% for the 12-month follow-up questionnaire, between 90% and 94% for women who completed more than a year in the study, and 93% for women electing to answer pregnancy-specific questionnaires. Follow-up rates do not differ significantly by race/ethnicity and other demographic characteristics. NHS3 has taken advantage of new technologies (Web-based questionnaires and mobile devices), and the subsequent personalization of study timelines (e.g., reporting pregnancies in “real time”) has increased the focus on prospective data collection during key life course periods.

With continuous data collection since 1976, the NHSs have generated layers of resources, from behavioral data across the life course to various biological specimens. To accommodate the wealth of high-dimensional data and maximize cost efficiency, we have developed an effective infrastructure that supports questionnaire development and processing, cohort follow-up, an extensive biorepository, and data management and statistical analysis.

Questionnaires
Development.

Our biennial questionnaires have produced a unique resource of lifestyle- and health-related data collected continuously over the past 40 years (see the box on p1574). The implementation of optically scanned questionnaires in 1982 allowed faster, more efficient, and reliable data entry, making room for growth in the scientific scope of the study with longer questionnaires and lower processing time and cost. In addition to questions regarding diet and lifestyle behaviors, we have regularly assessed anxiety, depression, optimism, and social networks on the Medical Outcomes Short Form-36 (SF-36), a validated3 scale of functional health and well-being.

We have also developed measures of environmental exposures (e.g., air pollution and ultraviolet radiation) and neighborhood characteristics (e.g., socioeconomic status, walkability, access to green space) on the basis of residential histories. NHS II participants additionally reported body size at aged 5, 10, and 20 years in 1989 and physical activity at aged 12 to 17 years in 1997; a subset of 41% of the women completed a supplemental FFQ asking about high school diet in 1998. In 2001/2002, 39 904 mothers of NHS and NHS II (90% NHS II) participants provided detailed information about their prepregnancy, pregnancy, and early life experiences with our cohort members.4 These unique data give us the opportunity to study relations of disease risk to a comprehensive set of social, lifestyle, and environmental exposures at multiple important time points across the life course.

Diet and physical activity questionnaires.

Beginning in 1980 the NHS questionnaires included a validated FFQ every 2 to 4 years.5 This FFQ was expanded from 61 to 116 items in 1984 and 131 items in 1991. The FFQ was further modified in NHS3 to include more detailed questions about cooking methods and is administered during key windows of exposure (e.g., pregnancy). A team of research dietitians works to maintain a complete and accurate database of the nutrient content of foods included on the FFQ: updating the nutrient components of foods that change over time (i.e., trans fat) and researching the nutrient content of new or reformulated products. Our data on the fatty acid composition of foods are unique because we have directly analyzed representative samples of foods every 4 years to account for changes in manufacturing. These data are used in combination with national consumption patterns that are analyzed each year that the FFQ is administered. As described in an accompanying article, the validity of these dietary questionnaires has been assessed extensively.5

Similarly, the cohort questionnaires include repeated measures of physical activity using a modified Paffenbarger questionnaire. Participants report the average time per week over the last year spent doing a list of specific activities (e.g., jogging), selected to represent the most important contributors to total activity. The physical activity questionnaire was initially validated among 231 NHS II women who completed both 1-week activity recalls and 7-day activity diaries as comparison methods.6 For total physical activity, the correlation with the recalls was 0.79 and with the records diaries it was 0.62. We are currently conducting additional validation studies that include participants in the 3 cohorts to further evaluate the validity of these methods.5

Follow-Up
Response rate.

Our research of various mailing strategies determined that certified mail was the most effective approach for reaching study participants who did not respond to an initial mailing; therefore, participants who do not respond to 3 regular mailings were resent the questionnaire via certified mail.7 Because of increasing costs, certified mail has been replaced with hand-addressed mailings for the majority of participants over the past 7 years, with a response rate of approximately 30% for the first hand-addressed mailing and approximately 15% for the second and final mailing after 6 to 7 regular mailings without a response. With all mailings, the overall response rate for participants as of 2012 is 86.2%. This high retention rate stems from many factors, including but not limited to the selection of motivated health professionals, frequent contact with participants, repeated mailings followed by hand-addressed letters to nonresponders, the use of a shorter questionnaire in the final mailings to nonresponders, and an annual participant newsletter.

Documentation of disease and deaths.

Participants report newly diagnosed diseases biennially on their follow-up questionnaires. For any new report of disease (Table 3 presents a list of selected outcomes), we ask permission to review medical records and collect pathology specimens. Once permission is obtained, our software generates repeated mailings to hospitals to obtain medical records.

In virtually every instance in which we have signed permission, we have obtained documentation of cancer diagnoses. Physicians blinded to questionnaire exposure data review all medical records to confirm self-reported diseases. Participants additionally report diseases that are not confirmed but have been shown to be validly self-reported among these health professionals, such as hypertension,8 high cholesterol,8 and weight.9 Furthermore, we systematically search the National Death Index and state tumor registries. Searching the National Death Index is a highly effective method for monitoring deaths,10 and it now provides all listed causes of death, reducing the need to obtain death certificates.

Biorepository
Blood, urine, cheek cells, toenails, stool, and saliva.

NHS and NHS II have a rich resource of biospecimens, including blood, urine, toenails, buccal cells, stool, and saliva (Figure 1; Tables 1 and 2), and the collection of repeated samples substantially increases statistical power for analyses of age and latency effects. The different biospecimens provide an extensive range of research opportunities; for example, we have examined plasma and urinary sex hormones, dietary biomarkers, inflammatory markers, heavy metals in toenails, melatonin in urine, and time-integrated fatty acid status in red blood cells. Additionally, the white blood cells provide a source of DNA for genome-wide association studies, sequencing, copy number variation, telomere length, and epigenetic analyses.

We test the assay performance in our mailed samples (fit-for-purpose) before use in participant samples: first, we assess split sample reproducibility using the interassay coefficient of variation. Second, because blood and urine were mailed to our central laboratory and processed the day after collection, we compare biomarker values in samples processed immediately versus 24 or 48 hours after collection11,12 using samples collected specifically for this purpose. Finally, we examine within-person stability over 1 to 3 years to confirm whether analytes measured in a single sample can be used to represent long-term status and thus are useful to study disease incidence prospectively.

Tissue.

NHS has one of the largest tumor tissue repositories nested within a prospective epidemiological study (Table 2). Our recovery rate ranges from 70% to 80% for formalin-fixed paraffin-embedded blocks and hematoxylin and eosin–stained slides. We use remnant tissue blocks in excess of what is required for standard of care from local hospitals, covering a range of lesion types and years of diagnosis to conduct pilot studies of assay precision. In collaboration with the Dana-Farber and Harvard Cancer Center Core, we have constructed hundreds of tissue microarrays (TMAs), an efficient approach to measuring biomarkers using hundreds of samples on the same block.13 To date, we have constructed TMAs to study various biomarkers, including tumor cell protein and immune cell expression, including 44 TMAs for 5936 breast cancers, 10 TMAs for 1123 colorectal tumors, 3 TMAs for 250 ovarian cancers, and 7 TMAs from 958 benign breast disease and normal breast tissue specimens (an appendix containing related NHS and NHS II publications is available as a supplement to the online version of this article at http://www.ajph.org).

Our tumor tissue resources are unique because the corresponding lifestyle data and blood samples are available. The tumor tissue specimens allow us to evaluate hypotheses relating etiologic factors to specific somatic molecular characteristics and to evaluate interactions between exogenous factors and tumor molecular features in relation to tumor behavior, response to treatment, and survival.14–22 Furthermore, by providing a link between an etiologic factor and a specific tumor molecular subtype, these studies may eventually provide important support for individualized treatment approaches.

Mammographic density.

NHS has been at the forefront of research in mammographic density, 1 of the strongest risk factors for breast cancer.23–25 We collected mammograms from NHS participants with breast cancer and controls who provided a blood sample in 1989/90, allowing us to use biomarker data in analyses of mammographic density. We obtained prediagnositic mammograms from 1446 participants with breast cancer and 2406 controls.26 We digitize all film mammograms and are collecting prediagnostic mammograms for women who provided a second blood sample in the 2000 collection. This will provide a unique resource allowing us to assess changes in mammographic density (approximately 10 years apart) and its correlation with changes in circulating biomarkers and breast cancer risk. Furthermore, we are using novel imaging technology to characterize a variety of mammogram texture features (i.e., radiomics). Importantly, our measurement of mammographic density is highly reproducible, with a within-person intraclass correlation coefficient of 0.93.27

Data Management and Statistical Analysis

The Statistics Group and the Channing computer facility are key infrastructure resources for the management of our database as it grows with the continuous receipt of biennial questionnaires, new -omics data, and new additions to the nutrient composition database. The Statistics Group collaborates on complex statistical questions, develops statistical software, maintains and updates our growing databases (e.g., integrating new genome-wide association studies or -omics results), and leads regular seminars and training sessions, including teaching newer methods for integrative analyses of -omics data sets.

We have developed more than 30 custom-designed SAS macros for analyzing NHS data (http://www.hsph.harvard.edu/donna-spiegelman/software). These include basic macros, for example, reading in the data from multiple questionnaires, as well as more sophisticated tools, for example, using the risk set regression calibration method to correct for bias stemming from measurement error in baseline or time-varying exposures28 and direct statistical testing of heterogeneity of risk factor associations across disease subtypes using competing risks analysis (prospective studies) or polytomous logistic regression (nested case–control studies).

Most analyses are conducted by individual trainees supervised by study investigators, a key to our scientific productivity. Indeed, the NHS is a major resource for training the next generation in epidemiology, genetics, and nutrition, with more than 108 doctoral students and 100 postdoctoral fellows from Harvard and many other institutions thus far. We provide a user manual for the computer system, extensive documentation of statistical analyses, and sample computer programs. The macros help to ensure that analyses are done consistently and correctly across investigators with the most up-to-date statistical methods. Finally, to ensure high quality, all articles undergo rigorous review: project proposals and initial results are vetted at our biweekly meetings, and articles are reviewed for programming, technical accuracy, and scientific accuracy before they are approved for submission.

In addition to research data (biennial, supplemental, and substudy questionnaires; field study data; medical and death records; specimen data; -omics data; geocoding; genetics), there are extensive organizational data, including participant- and study-tracking information, the Biorepository’s Laboratory Information Management System, documentation of study protocols, and internal and external Web sites. A team of 35 data management experts, programmers, application developers, and information technology specialists oversee daily operations. They develop and maintain a suite of homegrown software applications, Oracle databases, Java applications, the Laboratory Information Management System, survey development apps, and Web sites. Currently, the cohort data are maintained on a private cluster consisting of 200 UNIX and Linux servers, a 400-terabyte multitiered Avere–Isilon storage system, and an EMC NetWorker backup system. The system is backed up daily, with offline and offsite permanent storage.

More than 90% of the funding for this large NHS infrastructure comes from federal grants.

FUTURE DIRECTIONS

To maximize the value of the NHSs’ resources and broaden the scope of research, we have facilitated data sharing and collaborations with external investigators, expanded the cohort to minorities and male nurses, and integrated new methodologies that provide innovative research dimensions.

External Collaborations

All questionnaires, details about biospecimen collections, and guidelines for outside users are available on our public Web site (http://www.channing.harvard.edu/nhs). More than 220 external collaborations with researchers across the globe have begun over the past 10 years. Typically, an outside researcher prepares a brief proposal, and the NHS investigator group reviews the proposal to verify that there is no overlap with ongoing work and to identify a local investigator to assist with the project. This process is identical for local investigators.

Once a request is approved, we provide an introduction to our data and a secure password to access our de-identified data files. In addition, the NHS has contributed to more than 40 consortiums and pooling projects, including genome-wide association studies and other genetic studies (e.g., the Ovarian Cancer Association Consortium), plasma-based pooling studies (e.g., the Vitamin D Pooling Project of Rarer Cancers), and consortiums involving pooled questionnaire data (e.g., the Collaborative Group on Epidemiologic Studies of Female Cancers and the Pooling Project of Diet and Cancer).

Recruiting Minorities and Male Nurses

The demographics of nursing have changed greatly since 1976, when NHS started, with increasing proportions of minorities and men. One of the key recruitment priorities for NHS3 is to increase minority participation. We conducted a target mailing of invitational postcards in 2012 to minority-dense zip codes and targeted licensed practical nurses and licensed vocational nurses. This doubled the enrollment rate of African American and Hispanic women from the rate of invitations originating from nursing organizations or colleagues already participating in the study.

Ongoing relationships with nursing organizations, such as the National Black Nurses Association, have also been key tools to reach potential participants of color. In 2015 we began recruiting men as primary study participants, and male partners of female NHS3 participants are included in some of the pregnancy substudies. Of note, the NHS cohorts are complemented by the parallel, ongoing Health Professionals Follow-up Study cohort of 50 529 male health professionals, which began in 1986.

Innovations
Linkage with Medicare data.

With collaborators at the Dartmouth Institute for Health Policy and Clinical Practice, we are currently using the vast Centers for Medicare and Medicaid Services’ resources to obtain information regarding NHS participants’ health and health care utilization. First, claims data will help identify cancer and other diagnoses, especially among women who have been lost to follow-up. We will also obtain Medicare prescription data (Part 4) for research in disease etiology and survival. These data will form the basis for a new line of research into health care costs and utilization.

Cancer survival.

With repeated exposure information both pre- and postdiagnosis, NHS is uniquely positioned to evaluate when a variable is important to survival during the disease process and offer key findings with actionable clinical implications. For example, postdiagnostic moderate physical activity (equivalent to walking 3 hours/week) was associated with 50% lower breast cancer mortality,29 postdiagnostic aspirin was associated with 50% lower mortality from breast cancer,30 and among colorectal cancer patients with COX-2 positive tumors, postdiagnostic aspirin was associated with 61% lower mortality from colorectal cancer.31 Future studies will evaluate novel tissue markers, such as our recent finding that the androgen receptor is an independent predictor of breast cancer survival.17 The continued administration of the SF-36 and several mental health scales will permit substantial research on functional health and well-being after cancer, with the unique ability to control for prediagnostic functioning.

Metabolomics.

NHS has applied metabolomics to the biology of various diseases, potentially uncovering novel pathways in etiology and new targets for intervention. In our pilot studies of the Massachusetts Institute of Technology and Harvard Broad Institute metabolomics platforms,32 the majority of metabolites performed well (coefficient of variation < 20% for 92% of metabolites). Intraclass correlation coefficients comparing samples processed immediately versus after 24 hours were ≥ 0.75 for approximately 75% of metabolites, indicating our collection and processing methods will not interfere with findings for the majority of available metabolites. In samples collected from participants 2 years apart, reasonable long-term reproducibility (intraclass correlation coefficient ≥ 0.4) was observed for 90% of metabolites. Thus, most measured metabolites are excellent candidates for epidemiological studies. Recently, we found that elevated levels of branched-chain amino acids were associated with a more than doubled increased risk of pancreatic cancer, with stronger associations for participants diagnosed within 5 years of blood draw, suggesting that this may be an early marker of pancreatic cancer.33

The NHS has sustained remarkable scientific productivity in the past 40 years, reflected in more than 1200 publications that have substantially influenced prevention recommendations by such organizations as the American Cancer Society, the American Heart Association, and the World Health Organization. Robust findings from the NHS contributed to the evidence base for developing US Dietary Guidelines to reduce intakes of trans fat, saturated fat, sugar-sweetened beverages, red and processed meats, and refined carbohydrates while promoting the higher intake of healthy types of fats (e.g., unsaturated fats from vegetable oils, nuts and seeds, and seafood) and carbohydrates (e.g., whole grains, fruits, and vegetables).5 NHS studies on the benefits of physical activity contributed to the evidence base for the Physical Activity Guidelines for Americans.34 NHS studies of survivorship after cancer diagnosis also contributed to the Nutrition and Physical Activity Guidelines for Cancer Survivors.34

A fundamental reason for the impact of the NHSs is that they have been embedded in an academic environment with great strengths in many disciplines, particularly epidemiology, medicine, and biostatistics, and have been able to incorporate the energy and creativity of students, fellows, and faculty from across the whole of our biomedical community. Other major lessons learned include the value of identifying a loyal and interested cohort base and maintaining good contact with the participants, the value of repeated assessments over time, the commitment to training that provides interesting and innovative opportunities for young people to keep the research fresh, and the value of avoiding complacency and being alert to new opportunities and new methodologies to use the resources for maximum scientific and public health gain.

The combination of repeated measures of traditional risk factors along with diet, physical activity, lifestyle, and biochemical and genetic data in the NHS cohorts, as well as state-of-the-art mobile high-resolution measures, provides a resource for powerful etiologic and translational research. This type of big data within the context of a well-characterized cohort study is expected to uncover disease mechanisms and provide critical insight for public health prevention programs and personalized medicine.

ACKNOWLEDGMENTS

The Nurses’ Health Study (NHS) is supported by the National Institutes of Health (NIH; grants UM1 CA186107, P01 CA87969, R01 CA49449, R01 HL034594, and R01 HL088521). NHS II is supported by NIH grants UM1 CA176726 and R01 CA67262. Y. B. is supported by an NIH grant (P30 DK046200) and a KL2/Catalyst Medical Research Investigator Training award (an appointed KL2 award) from Harvard Catalyst/The Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health award KL2 TR001100).

The authors would like to thank the participants and staff of the Nurses’ Health Study for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY.

HUMAN PARTICIPANT PROTECTION

The Nurses’ Health Studies were approved by the Human Research Committee at the Brigham and Women’s Hospital, Boston, MA, and participants provided written informed consent.

References

1. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits; a preliminary report. BMJ. 1954;1(4877):14511455. Crossref, MedlineGoogle Scholar
2. Salvini S, Hunter DJ, Sampson L, et al. Food-based validation of a dietary questionnaire: the effects of week-to-week variation in food consumption. Int J Epidemiol. 1989;18(4):858867. Crossref, MedlineGoogle Scholar
3. Brazier JE, Harper R, Jones NM, et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ. 1992;305(6846):160164. Crossref, MedlineGoogle Scholar
4. Michels KB, Willett WC, Graubard BI, et al. A longitudinal study of infant feeding and obesity throughout life course. Int J Obes (Lond). 2007;31(7):10781085. Crossref, MedlineGoogle Scholar
5. Hu FB, Satija A, Rimm EB, et al. Diet assessment methods in the Nurses' Health Studies and contribution to evidence-based nutritional policies and guidelines. Am J Public Health. 2016;106(9):15671572. LinkGoogle Scholar
6. Wolf AM, Hunter D, Colditz GA, et al. Reproducibility and validity of a self-administered physical activity questionnaire. Int J Epidemiol. 1994;23(5):991999. Crossref, MedlineGoogle Scholar
7. Rimm EB, Stampfer MJ, Colditz GA, Giovannucci E, Willett WC. Effectiveness of various mailing strategies among nonrespondents in a prospective cohort study. Am J Epidemiol. 1990;131(6):10681071. Crossref, MedlineGoogle Scholar
8. Colditz GA, Martin P, Stampfer MJ, et al. Validation of questionnaire information on risk factors and disease outcomes in a prospective cohort study of women. Am J Epidemiol. 1986;123(5):894900. Crossref, MedlineGoogle Scholar
9. Rimm EB, Stampfer MJ, Colditz GA, Chute CG, Litin LB, Willett WC. Validity of self-reported waist and hip circumferences in men and women. Epidemiology. 1990;1(6):466473. Crossref, MedlineGoogle Scholar
10. Rich-Edwards JW, Corsano KA, Stampfer MJ. Test of the National Death Index and Equifax Nationwide Death Search. Am J Epidemiol. 1994;140(11):10161019. Crossref, MedlineGoogle Scholar
11. Hankinson SE, London SJ, Chute CG, et al. Effect of transport conditions on the stability of biochemical markers in blood. Clin Chem. 1989;35(12):23132316. MedlineGoogle Scholar
12. Schernhammer ES, Hankinson SE. Urinary melatonin levels and breast cancer risk. J Natl Cancer Inst. 2005;97(14):10841087. Crossref, MedlineGoogle Scholar
13. Camp RL, Charette LA, Rimm DL. Validation of tissue microarray technology in breast carcinoma. Lab Invest. 2000;80(12):19431949. Crossref, MedlineGoogle Scholar
14. Collins LC, Cole KS, Marotti JD, Hu R, Schnitt SJ, Tamimi RM. Androgen receptor expression in breast cancer in relation to molecular phenotype: results from the Nurses’ Health Study. Mod Pathol. 2011;24(7):924931. Crossref, MedlineGoogle Scholar
15. Collins LC, Marotti JD, Baer HJ, Tamimi RM. Comparison of estrogen receptor results from pathology reports with results from central laboratory testing. J Natl Cancer Inst. 2008;100(3):218221. Crossref, MedlineGoogle Scholar
16. Holmes MD, Chen WY, Schnitt SJ, et al. COX-2 expression predicts worse breast cancer prognosis and does not modify the association with aspirin. Breast Cancer Res Treat. 2011;130(2):657662. Crossref, MedlineGoogle Scholar
17. Hu R, Dawood S, Holmes MD, et al. Androgen receptor expression and breast cancer survival in postmenopausal women. Clin Cancer Res. 2011;17(7):18671874. Crossref, MedlineGoogle Scholar
18. Liu Y, Tamimi RM, Collins LC, et al. The association between vascular endothelial growth factor expression in invasive breast cancer and survival varies with intrinsic subtypes and use of adjuvant systemic therapy: results from the Nurses’ Health Study. Breast Cancer Res Treat. 2011;129(1):175184. Crossref, MedlineGoogle Scholar
19. Santagata S, Hu R, Lin NU, et al. High levels of nuclear heat-shock factor 1 (HSF1) are associated with poor prognosis in breast cancer. Proc Natl Acad Sci USA. 2011;108(45):1837818383. Crossref, MedlineGoogle Scholar
20. Tamimi RM, Baer HJ, Marotti J, et al. Comparison of molecular phenotypes of ductal carcinoma in situ and invasive breast cancer. Breast Cancer Res. 2008;10(4):R67. Crossref, MedlineGoogle Scholar
21. Ogino S, Chan AT, Fuchs CS, Giovannucci E. Molecular pathological epidemiology of colorectal neoplasia: an emerging transdisciplinary and interdisciplinary field. Gut. 2011;60(3):397411. Crossref, MedlineGoogle Scholar
22. Ogino S, Stampfer M. Lifestyle factors and microsatellite instability in colorectal cancer: the evolving field of molecular pathological epidemiology. J Natl Cancer Inst. 2010;102(6):365367. Crossref, MedlineGoogle Scholar
23. Byrne C, Schairer C, Wolfe J, et al. Mammographic features and breast cancer risk: effects with time, age, and menopause status. J Natl Cancer Inst. 1995;87(21):16221629. Crossref, MedlineGoogle Scholar
24. Boyd NF, Lockwood GA, Martin LJ, Byng JW, Yaffe MJ, Tritchler DL. Mammographic density as a marker of susceptibility to breast cancer: a hypothesis. IARC Sci Publ. 2001;154:163169. MedlineGoogle Scholar
25. Boyd NF, Martin LJ, Stone J, Greenberg C, Minkin S, Yaffe MJ. Mammographic densities as a marker of human breast cancer risk and their use in chemoprevention. Curr Oncol Rep. 2001;3(4):314321. Crossref, MedlineGoogle Scholar
26. Tamimi RM, Hankinson SE, Colditz GA, Byrne C. Endogenous sex hormone levels and mammographic density among postmenopausal women. Cancer Epidemiol Biomarkers Prev. 2005;14(11 pt 1):26412647. Crossref, MedlineGoogle Scholar
27. Tamimi RM, Byrne C, Colditz GA, Hankinson SE. Endogenous hormone levels, mammographic density, and subsequent risk of breast cancer in postmenopausal women. J Natl Cancer Inst. 2007;99(15):11781187. Crossref, MedlineGoogle Scholar
28. Liao X, Zucker DM, Li Y, Spiegelman D. Survival analysis with error-prone time-varying covariates: a risk set calibration approach. Biometrics. 2011;67(1):5058. Crossref, MedlineGoogle Scholar
29. Holmes MD, Chen WY, Feskanich D, Kroenke CH, Colditz GA. Physical activity and survival after breast cancer diagnosis. JAMA. 2005;293(20):24792486. Crossref, MedlineGoogle Scholar
30. Holmes MD, Chen WY, Li L, Hertzmark E, Spiegelman D, Hankinson SE. Aspirin intake and survival after breast cancer. J Clin Oncol. 2010;28(9):14671472. Crossref, MedlineGoogle Scholar
31. Chan AT, Ogino S, Fuchs CS. Aspirin use and survival after diagnosis of colorectal cancer. JAMA. 2009;302(6):649658. Crossref, MedlineGoogle Scholar
32. Townsend MK, Clish CB, Kraft P, et al. Reproducibility of metabolomic profiles among men and women in 2 large cohort studies. Clin Chem. 2013;59(11):16571667. Crossref, MedlineGoogle Scholar
33. Mayers JR, Wu C, Clish CB, et al. Elevation of circulating branched-chain amino acids is an early event in human pancreatic adenocarcinoma development. Nat Med. 2014;20(10):11931198. Crossref, MedlineGoogle Scholar
34. Colditz GA, Philpott SE, Hankinson SE. The impact of the Nurses’ Health Study on population health: prevention, translation, and control. Am J Public Health. 2016;106(9):15401545. LinkGoogle Scholar

Related

No related items

TOOLS

SHARE

ARTICLE CITATION

Ying Bao, MD, ScD, Monica L. Bertoia, PhD, MPH, Elizabeth B. Lenart, PhD, Meir J. Stampfer, MD, DrPH, Walter C. Willett, MD, DrPH, Frank E. Speizer, MD, and Jorge E. Chavarro, MD, ScDYing Bao, Meir J. Stampfer, and Frank E. Speizer are with the Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA. Monica L. Bertoia, Elizabeth B. Lenart, Walter C. Willett, and Jorge E. Chavarro are with the Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA. “Origin, Methods, and Evolution of the Three Nurses’ Health Studies”, American Journal of Public Health 106, no. 9 (September 1, 2016): pp. 1573-1581.

https://doi.org/10.2105/AJPH.2016.303338

PMID: 27459450