Developing an evidence base for making public health decisions will require using data from evaluation studies with randomized and nonrandomized designs. Assessing individual studies and using studies in quantitative research syntheses require transparent reporting of the study, with sufficient detail and clarity to readily see differences and similarities among studies in the same area. The Consolidated Standards of Reporting Trials (CONSORT) statement provides guidelines for transparent reporting of randomized clinical trials.
We present the initial version of the Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) statement. These guidelines emphasize the reporting of theories used and descriptions of intervention and comparison conditions, research design, and methods of adjusting for possible biases in evaluation studies that use nonrandomized designs.
OVER THE PAST SEVERAL decades, a strong movement toward evidence-based medicine has emerged.1–3 In the context of evidence-based medicine, clinical decisions are based on the best available scientific data rather than on customary practices or the personal beliefs of the health care provider. There is now a parallel movement toward evidence-based public health practices.4,5 The movement is intended to utilize the best available scientific knowledge as the foundation for public health–related decisionmaking.
In the context of evidence-based medicine, the randomized controlled trial (RCT) is usually considered of greatest evidentiary value for assessing the efficacy of interventions. Indeed, the preference for this design is sufficiently strong that when empirical evidence from RCTs is available, “weaker” designs are often considered to be of little or no evidentiary value. In this issue, Victora et al.6 make a strong argument that evidence-based public health will necessarily involve the use of research designs other than RCTs. Most important, they argue that RCTs are often not practical or not ethical for evaluating many public health interventions and discuss methods for drawing causal inferences from nonrandomized evaluation designs (“plausibility” and “adequacy” designs in their terminology).
Also in this issue, Donner and Klar,7 Murray et al.,8 and Varnell et al.9 provide overviews of the benefits and pitfalls of the group-randomized trial, which, in some situations, may be a reasonable alternative to the RCT. There are also a wide variety of nonrandomized evaluation designs that can contribute important data on the efficacy or effectiveness of interventions, such as quasi-experimental designs,10 nonrandomized trials, and natural experiments. Including these types of designs in developing evidence-based recommendations can provide a more integrated picture of the existing evidence and could help to strengthen public health practice. Excluding data collected under such designs would undoubtedly bias the evidence base toward interventions that are “easier” to evaluate but not necessarily more effective or cost-effective.
If nonrandomized designs are to be systematically used in building evidence-based public health practices, it will be necessary to improve the reporting quality of these types of studies. The transparency, or clarity, in the reporting of individual studies is key. Sufficient detail and clarity in the report allow readers to understand the conduct and findings of the intervention study and how the study was different from or similar to other studies in the field.
Furthermore, evidence-based practice may often rely on meta-analyses of large numbers of studies, some of which may report negative results. Metaanalysis requires full reporting of methods and outcomes to enable assessment of comparability of different studies. Inadequate, or nontransparent, reporting may make it difficult to understand the variables that affect intervention outcomes and the central elements in intervention success or failure over multiple studies.
In recent years, efforts have been made to improve the quality of reporting of RCTs. The Consolidated Standards of Reporting Trials (CONSORT) statement11 provides a 22-item checklist and subject flow chart for the transparent reporting of RCTs. This statement has been adopted as a framework for the reporting of RCTs by a large number of medical, clinical, and psychological journals (153, according to http://www.consort-statement.org, as of September 16, 2003). Use of the CONSORT statement has improved the quality of RCT reports over the past several years.12 There is yet, however, no agreed-upon framework for the transparent reporting of nonrandomized research evaluations.
The HIV/AIDS Prevention Research Synthesis (PRS) team of the Centers for Disease Control and Prevention (CDC) has been synthesizing evidence from HIV behavioral intervention studies involving RCT and nonrandomized designs. The PRS team found that many study reports failed to include critical information (e.g., intervention timing and dosage, effect size data) necessary for research syntheses.13–17 To improve their ability to synthesize HIV behavioral prevention research, the PRS team convened the CDC’s Journal Editors Meeting in Atlanta, Ga, on July 24–25, 2003; this meeting was attended by editors and representatives of 18 journals that publish HIV behavioral intervention studies (a complete list of the journals is available from the authors and at http://www.TREND-statement.org). The main goals of the meeting were to (1) communicate the usefulness and importance of adequate reporting standards, (2) reach consensus on reporting standards for behavioral interventions, (3) develop a checklist of reporting standards to guide authors and journal reviewers, and (4) develop strategies to disseminate the resulting reporting standards.
The discussions at the meeting broadened to include standardized reporting of behavioral and public health interventions in general, rather than focusing only on HIV behavioral interventions. There was strong consensus at the meeting in regard to more standardized and transparent reporting of research evaluations using other than randomized designs, particularly those with some form of comparison group. This agreement was reached with the realization that additional input would be needed from a wide variety of researchers, other journal editors, and practitioners in the public health field before the adoption of a final set of reporting standards.
Table 1 presents a proposed checklist—the Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) checklist—for reporting standards of behavioral and public health intervention evaluations involving nonrandomized designs. The TREND checklist is meant to be consistent with the CONSORT checklist for the reporting of RCTs. Items presented in boldface type in the table are particularly relevant to behavioral and public health intervention studies, whether or not randomized designs are used. Thus, we would suggest that they be used to expand the information requested by CONSORT for RCTs of behavioral and public health interventions. Some of the items (8, 10, and 15) presented in the proposed TREND checklist are not relevant to RCTs and, thus, not included in the CONSORT checklist, but they are extremely relevant to nonrandomized designs. We also refer readers to CONSORT elaboration reports11,18 that provide rationales and examples for items in Table 1 that are shared with the CONSORT checklist.
The TREND checklist is proposed for intervention evaluation studies using nonrandomized designs, not for all research using nonrandomized designs. Intervention evaluation studies would necessarily include (1) a defined intervention that is being studied and (2) a research design that provides for an assessment of the efficacy or effectiveness of the intervention. Thus, our proposed checklist emphasizes description of the intervention, including the theoretical base; description of the comparison condition; full reporting of outcomes; and inclusion of information related to the design needed to assess possible biases in the outcome data. Brief comments may be helpful for a few of the items included in the proposed TREND checklist.
• | Use of theory (item 2). Behavioral and social science theories provide a framework for generating cumulative knowledge. Thus, it would be very helpful to include references to the theoretical bases of the intervention being evaluated. This would permit identification of theories that are useful in developing interventions in different fields. Some interventions, however, are based on atheoretical needs assessments or simply the experience of the individuals who designed the intervention. In these situations, a post hoc application of a theory is not likely to be helpful. | ||||
• | Description of the intervention condition and the services provided in a comparison condition (item 4). Although space is limited in many journals, it is still critical to provide sufficient detail so that a reader has an understanding of the content and delivery of both the experimental intervention and the services in the comparison condition. For example, “usual care” is not a helpful description of a comparison condition. | ||||
• | Description of the research design (item 8). We recognize that there can be meaningful disagreement about what research design was actually used in an intervention study, including whether an RCT design was used. To minimize confusion, it would be helpful for authors to specify the design they intended to use, particularly the method of assignment, and any variations or deviations from the design. |
It is important to note that the TREND checklist is not intended to serve as a criterion for evaluating papers for publication. Rather, it is intended to improve the quality of data reporting in peer-reviewed publications so that the conduct and the findings of research are transparent. As the volume of public health literature is consistently expanding, research synthesis becomes an important tool for creating a cumulative body of knowledge and making evidence-based recommendations of effective interventions. Reporting standards will help ensure that fewer intervention trials with nonrandomized designs are missing information critical for research synthesis and that comparable information across studies can be more easily consolidated and translated into generalizable knowledge and practice.
We recognize several challenges in promoting and disseminating reporting standards for nonrandomized intervention evaluations. Most important, the TREND checklist is only a suggested set of guidelines and should be considered a work in progress. It is highly likely that improvements will be necessary; moreover, adaptations may be needed to refine the standards for specific fields of intervention research, and additional specifications for specific types of nonrandomized evaluation designs are likely to be needed. Furthermore, page limitations in many journals create strong pressure toward shorter rather than longer articles. Some alternatives were recommended in the CDC’s Journal Editors Meeting to resolve the space issue, such as having additional information from a published study provided on a journal’s Web site or on an author’s Web site or having authors send additional information to relevant research synthesis groups or a central repository.
Finally, the process has so far involved only CDC scientists and journal editors in a single meeting along with the preparation of this commentary. Although many of the journal editors included are notable researchers in the fields of HIV, public health, and drug abuse prevention, we realize that successful promotion and dissemination of these guidelines must involve an ongoing dialogue and must be extended to a large number of other researchers, methodologists, and statisticians across various healthrelated research fields.
In an effort to initiate this dialogue, we invite all editors, reviewers, authors, and readers to provide comments and feedback to help us revise the standards. Comments can be sent to [email protected], and the TREND group will periodically revise the guidelines accordingly. Also, journals are encouraged to endorse this effort by publishing editorials or commentaries on the TREND statement or by referencing it in their publication guidelines for authors and reviewers. To increase accessibility and ease of use, the revised versions of the TREND statement will be posted on an open access Web site (http://www.TREND-statement.org).
If the movement toward evidence-based public health is to succeed, it will be necessary to improve our ability to synthesize research on public health interventions. As Victora and colleagues note, this will include using data from intervention evaluations that do not involve randomized designs. The TREND statement presented here is proposed as a first step toward developing standardized and transparent reporting for nonrandomized intervention research evaluations in public health–related fields.
Paper Section/Topic | Item No. | Descriptor | Examples From HIV Behavioral Prevention Research | |
---|---|---|---|---|
Title and abstract | 1 | • Information on how units were allocated to interventions | Example (title): A nonrandomized trial of a clinic-based HIV counseling intervention for African American female drug users | |
• Structured abstract recommended | ||||
• Information on target population or study sample | ||||
Introduction | • Scientific background and explanation of rationale | |||
Background | 2 | • Theories used in designing behavioral interventions | Example (theory used): the community-based AIDS intervention was based on social learning theory | |
Methods | ||||
Participants | 3 | • Eligibility criteria for participants, including criteria at different levels in recruitment/sampling plan (e.g., cities, clinics, subjects) | ||
• Method of recruitment (e.g., referral, self-selection), including the sampling method if a systematic sampling plan was implemented | Example (sampling method): using an alphanumeric sorted list of possible venues and times for identifying eligible subjects, every tenth venue–time unit was selected for the location and timing of recruitment | |||
• Recruitment setting | Examples (recruitment setting): subjects were approached by peer opinion leaders during conversations at gay bars | |||
• Settings and locations where the data were collected | ||||
Interventions | 4 | • Details of the interventions intended for each study condition and how and when they were actually administered, specifically including: | ||
Content: what was given? | ||||
Delivery method: how was the content given? | ||||
Unit of delivery: how were subjects grouped during delivery? | Example (unit of delivery): the intervention was delivered to small groups of 5–8 subjects | |||
Deliverer: who delivered the intervention? | ||||
Setting: where was the intervention delivered? | Examples (setting): the intervention was delivered in the bars; the intervention was delivered in the waiting rooms of sexually transmitted disease clinics | |||
Exposure quantity and duration: how many sessions or episodes or events were intended to be delivered? How long were they intended to last? | Examples (exposure quantity and duration): the intervention was delivered in five 1-hour sessions; the intervention consisted of standard HIV counseling and testing (pretest and posttest counseling sessions, each about 30 minutes) | |||
Time span: how long was it intended to take to deliver the intervention to each unit? | Examples (time span): each intervention session was to be delivered (in five 1-hour sessions) once a week for 5 weeks; the intervention was to be delivered over a 1-month period. | |||
Activities to increase compliance or adherence (e.g., incentives) | Example (activities to increase compliance or adherence): bus tokens and food stamps were provided | |||
Objectives | 5 | • Specific objectives and hypotheses | ||
Outcomes | 6 | • Clearly defined primary and secondary outcome measures | ||
• Methods used to collect data and any methods used to enhance the quality of measurements | Examples (method used to collect data): self-report of behavioral data using a face-to-face interviewer-administered questionnaire; audio-computer-assisted self-administered instrument | |||
• Information on validated instruments such as psychometric and biometric properties | ||||
Sample size | 7 | • How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules | ||
Assignment method | 8 | • Unit of assignment (the unit being assigned to study condition, e.g., individual, group, community) | Example 1 (assignment method): subjects were assigned to study conditions using an alternating sequence wherein every other individual enrolled (e.g., 1, 3, 5, etc.) was assigned to the intervention condition and the alternate subjects enrolled (e.g., 2, 4, 6, etc.) were assigned to the comparison condition | |
• Method used to assign units to study conditions, including details of any restriction (e.g., blocking, stratification, minimization) | ||||
• Inclusion of aspects employed to help minimize potential bias induced due to nonrandomization (e.g., matching) | ||||
Example 2 (assignment method): for odd weeks (e.g. 1, 3, 5), subjects attending the clinic on Monday, Wednesday, and Friday were assigned to the intervention condition and those attending the clinic on Tuesday and Thursday were assigned to the comparison condition; this assignment was reversed for even weeks | ||||
Blinding (masking) | 9 | • Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to study condition assignment; if so, statement regarding how the blinding was accomplished and how it was assessed | Example (blinding): the staff member performing the assessments was not involved in implementing any aspect of the intervention and knew the participants only by their study identifier number | |
Unit of analysis | 10 | • Description of the smallest unit that is being analyzed to assess intervention effects (e.g., individual, group, or community) | Example 1 (unit of analysis): since groups of individuals were assigned to study conditions, the analyses were performed at the group level, where mixed effects models were used to account for random subject effects within each group | |
• If the unit of analysis differs from the unit of assignment, the analytical method used to account for this (e.g., adjusting the standard error estimates by the design effect or using multilevel analysis) | Example 2 (unit of analysis): since analyses were performed at the individual level and communities were randomized, a prior estimate of the intraclass correlation coefficient was used to adjust the standard error estimates before calculating confidence intervals | |||
Statistical methods | 11 | • Statistical methods used to compare study groups for primary outcome(s), including complex methods for correlated data | ||
• Statistical methods used for additional analyses, such as subgroup analyses and adjusted analysis | ||||
• Methods for imputing missing data, if used | ||||
• Statistical software or programs used | ||||
Results | ||||
Participant flow | 12 | • Flow of participants through each stage of the study: enrollment, assignment, allocation and intervention exposure, follow-up, analysis (a diagram is strongly recommended) | ||
Enrollment: the numbers of participants screened for eligibility, found to be eligible or not eligible, declined to be enrolled, and enrolled in the study | ||||
Assignment: the numbers of participants assigned to a study condition | ||||
Allocation and intervention exposure: the number of participants assigned to each study condition and the number of participants who received each intervention | ||||
Follow-up: the number of participants who completed the follow-up or did not complete the follow-up (i.e., lost to follow-up), by study condition | ||||
Analysis: the number of participants included in or excluded from the main analysis, by study condition | ||||
• Description of protocol deviations from study as planned, along with reasons | ||||
Recruitment | 13 | • Dates defining the periods of recruitment and follow-up | ||
Baseline data | 14 | • Baseline demographic and clinical characteristics of participants in each study condition | ||
• Baseline characteristics for each study condition relevant to specific disease prevention research | Example (baseline characteristics specific to HIV prevention research): HIV serostatus and HIV testing behavior | |||
• Baseline comparisons of those lost to follow-up and those retained, overall and by study condition | ||||
• Comparison between study population at baseline and target population of interest | ||||
Baseline equivalence | 15 | • Data on study group equivalence at baseline and statistical methods used to control for baseline differences | Example (baseline equivalence): the intervention and comparison groups did not statistically differ with respect to demographic data (gender, age, race/ethnicity; P > .05 for each), but the intervention group reported a significantly greater baseline frequency of injection drug use (P = .03); all regression analyses included baseline frequency of injection drug use as a covariate in the model | |
Numbers analyzed | 16 | • Number of participants (denominator) included in each analysis for each study condition, particularly when the denominators change for different outcomes; statement of the results in absolute numbers when feasible | Example (number of participants included in the analysis): the analysis of condom use included only those who reported at the 6-month follow-up having had vaginal or anal sex in the past 3 months (75/125 for intervention group and 35/60 for standard group) | |
• Indication of whether the analysis strategy was “intention to treat” or, if not, description of how noncompliers were treated in the analyses | Example (“intention to treat”): the primary analysis was intention to treat and included all subjects as assigned with available 9-month outcome data (125 of 176 assigned to the intervention and 110 of 164 assigned to the standard condition) | |||
Outcomes and estimation | 17 | • For each primary and secondary outcome, a summary of results for each study condition, and the estimated effect size and a confidence interval to indicate the precision | ||
• Inclusion of null and negative findings | ||||
• Inclusion of results from testing prespecified causal pathways through which the intervention was intended to operate, if any | ||||
Ancillary analyses | 18 | • Summary of other analyses performed, including subgroup or restricted analyses, indicating which are prespecified or exploratory | Example (ancillary analyses): although the study was not powered for this hypothesis, an exploratory analysis shows that the intervention effect was greater among women than among men (although not statistically significant) | |
Adverse events | 19 | • Summary of all important adverse events or unintended effects in each study condition (including summary measures, effect size estimates, and confidence intervals) | Example (adverse events): police cracked down on prostitution, which drove the target population, commercial sex workers, to areas outside the recruitment/sampling area | |
Discussion | ||||
Interpretation | 20 | • Interpretation of the results, taking into account study hypotheses, sources of potential bias, imprecision of measures, multiplicative analyses, and other limitations or weaknesses of the study | ||
• Discussion of results taking into account the mechanism by which the intervention was intended to work (causal pathways) or alternative mechanisms or explanations | ||||
• Discussion of the success of and barriers to implementing the intervention, fidelity of implementation | ||||
• Discussion of research, programmatic, or policy implications | ||||
Generalizability | 21 | • Generalizability (external validity) of the trial findings, taking into account the study population, the characteristics of the intervention, length of follow-up, incentives, compliance rates, specific sites/settings involved in the study, and other contextual issues | ||
Overall evidence | 22 | • General interpretation of the results in the context of current evidence and current theory |
Note. Masking (blinding) of participants or those administering the intervention may not be relevant or possible for many behavioral interventions. Theories used to design the interventions (see item 2) could also be reported as part of item 4. The comparison between study population at baseline and target population of interest (see item 14) could also be reported as part of item 21. Descriptors appearing in boldface are specifically added, modified, or further emphasized from the CONSORT statement. Boldface topic and descriptors are not included in the CONSORT statement but are relevant for behavioral interventions using nonrandomized experimental designs. The CONSORT statement11 or the explanation document for the CONSORT statement18 provides relevant examples for any topic or descriptor that is not in boldface. A structured format of the discussion is presented in Annals of Internal Medicine (information for authors; www.annals.org, accessed September 16, 2003).
Members of the TREND Group are as follows: Kamran Abbasi, MBChB, MRCP (BMJ), William Blattner, MD (University of Maryland, Baltimore; Journal of Acquired Immune Deficiency Syndromes), Jay Bernhardt, PhD, MPH (Emory University; Health Education Research), Bruce Bullington, PhD (Florida State University; Journal of Drug Issues), Raul Caetano, MD, PhD, MPH (University of Texas; Addiction), Terry Chambers, BA (Journal of Psychoactive Drugs), Harris Cooper, PhD (Duke University; Psychological Bulletin), Roel A. Coutinho, MD, PhD (University of Amsterdam; AIDS), Nicole Crepaz, PhD (Centers for Disease Control and Prevention), John D. DeLamater, PhD (University of Wisconsin, Madison; Journal of Sex Research), Don C. Des Jarlais, PhD (Beth Israel Medical Center; American Journal of Public Health), Jeffrey H. Herbst, PhD (Centers for Disease Control and Prevention), David Holtgrave, PhD (Emory University), Barry Hong, PhD (Washington University at St. Louis; Journal of Consulting and Clinical Psychology), Angela B. Hutchinson, PhD (Centers for Disease Control and Prevention), Seth Kalichman, PhD (University of Connecticut; Health Psychology), Linda Kay, MPH (Centers for Disease Control and Prevention), Cynthia Lyles, PhD (Centers for Disease Control and Prevention), Robert McNutt, MD (Rush St. Luke’s Medical Center at Chicago; Journal of the American Medical Association), Thomas L. Patterson, PhD (University of California, San Diego; AIDS and Behavior), Michael Ross, PhD, MPH, MA, MHPEd (University of Texas, Houston; AIDS Care), Theo Sandfort, PhD (Columbia University; Archives of Sexual Behavior), Ron Stall, PhD, MPH (Centers for Disease Control and Prevention), Francisco S. Sy, MD, DrPH (Centers for Disease Control and Prevention; AIDS Education and Prevention), Cesar G. Victora, MD, PhD (Universidade Federal de Pelotas), and David Vlahov, PhD (New York Academy of Medicine; American Journal of Epidemiology, Journal of Urban Health).
We also thank the following Prevention Research Synthesis team members for their assistance in the meeting preparation and contribution to the data reporting presentation: Julia Britton, Tanesha Griffin, Angela Kim, Mary Mullins, Paola Marrero-Gonzalez, Sima Rama, R. Thomas Sherba, and Sekhar Thadiparthi.