© 2004 American Public Health Association
David M. Murray and Jonathan L. Blitstein are with the Department of Psychology, College of Arts and Sciences, University of Memphis, Memphis, Tenn. Sherri P. Varnell is with Northrop-Grumman Mission Systems, Atlanta, Ga. Correspondence: Requests for reprints should be sent to David M. Murray, PhD, 3693 Norriswood, 202 Psychology Bldg, Memphis, TN 38134 (e-mail: d.murray{at}mail.psyc.memphis.edu).
We review recent developments in the design and analysis of group-randomized trials (GRTs). Regarding design, we summarize developments in estimates of intraclass correlation, power analysis, matched designs, designs involving one group per condition, and designs in which individuals are randomized to receive treatments in groups. Regarding analysis, we summarize developments in marginal and conditional models, the sandwich estimator, model-based estimators, binary data, survival analysis, randomization tests, survey methods, latent variable methods and nonlinear mixed models, time series methods, global tests for multiple endpoints, mediation effects, missing data, trial reporting, and software. We encourage investigators who conduct GRTs to become familiar with these developments and to collaborate with methodologists who can strengthen the design and analysis of their trials.
Group-randomized trials (GRTs) are comparative studies designed to evaluate interventions that operate at a group level, manipulate the physical or social environment, or cannot be delivered to individuals.1 Examples include school-, worksite-, and community-based studies designed to improve the health of students, employees, and residents, respectively. Just as the randomized clinical trial (RCT) is the gold standard in public health and medicine when allocation of individual participants is possible, the GRT is the gold standard when allocation of identifiable groups is necessary. There are 4 characteristics that distinguish the GRT from the more familiar RCT. First, the unit of assignment is an identifiable group; such groups are formed not at random but rather through some physical, social, geographic, or other connection among their members. Second, different groups are assigned to each condition, creating a nested or hierarchical structure for the design and the data. Third, the units of observation are members of those groups nested within both their condition and their group. Fourth, usually only a limited number of groups are assigned to each condition. These characteristics create several problems in the design and analysis of GRTs.1 The major design problem is that a limited number of often heterogeneous groups makes it difficult for randomization to distribute potential sources of confounding evenly in any single realization of the experiment. This increases the need to use design strategies that will limit confounding and analytic strategies to deal with confounding when it is detected. The major analytic problem is that there is an expectation for a positive intraclass correlation (ICC) among observations of members of the same group.2 That ICC reflects an extra component of variance attributable to the group above and beyond the variance attributable to its members. This extra variation will increase the variance of any group-level statistic beyond what would be expected with random assignment of members to conditions. Moreover, with a limited number of groups, the degrees of freedom available to estimate group-level statistics are limited. Any test that ignores either the extra variation or the limited degrees of freedom will have a type I error rate that is inflated, and this effect will only worsen as the ICC increases.3 Cornfield4(p101102) warned of this danger 25 years ago when he noted that ignoring these problems was "an exercise in selfdeception . . . and should be discouraged." That warning was followed by a gradual increase in the number of methods papers in this area. The first comprehensive text on the design and analysis of GRTs appeared in 1998.1 It detailed the design considerations for the development of GRTs, described the major approaches to their analysis both for Gaussian and binary data, and presented methods for power analysis applicable to most GRTs. We use that text as a point of departure for this review and assume that readers are familiar with its basic material. Over the past 5 years, many articles have discussed the methodological issues involved in GRTs generally or in design papers describing new trials.528 The second textbook on design and analysis of GRTs appeared in 2000.29 That text provided a good history of GRTs and examined the role of informed consent and other ethical issues. It focused on extensions of classical methods, although it also included material on regression models for Gaussian, binary, count, and time-to-event data. Other textbooks on analysis methods germane to GRTs appeared during the same period,3033 as well as a large number of articles on new methods relevant to the design and analysis of GRTs. In the sections that follow, we bring the reader up to date on many of these developments.
In 1998, Murray1 detailed the design considerations for a GRT, whether the study was to use a nested cohort or nested crosssectional design; whether the study was to have a posttest-only design, a pretestposttest design, or an extended design with multiple pretest/posttest measures; and whether the design was to be completely randomized or to include matching/stratification. At that time, investigators were limited by the paucity of ICC and other parameter estimates needed to select an efficient design and to ensure that the study would have adequate power (the probability of rejecting the null hypothesis when it is false). One of the important recent developments has been the publication of papers providing estimates for those parameters. Another has been the publication of important refinements in the methods used for power analysis. There have also been important developments in several specific designs, including matched designs, designs involving 1 group per condition, and designs in which individuals are randomized to receive treatments in groups.
New Estimates of ICCs
Murray and Blitstein34 also reported a pooled analysis of ICCs from worksite, school, and community studies. They confirmed that the adverse impact of a positive ICC can be reduced by regression adjustment for covariates1,3538 or by taking advantage of over-time correlation in a repeated measures analysis.1,35,39 Janega et al. (unpublished data, 2003) have shown that standard errors for intervention effects from end-of-study analyses that reflect these strategies are often different from the standard errors estimated from baseline analyses. Because the ICC of concern in any GRT is the ICC as it operates in the primary analysis,1 these findings reinforce the need for investigators to use estimates in their power analyses that closely reflect the endpoints, target population, and primary analysis planned for the trial. And while the sources just cited will help considerably in this regard, we join others who have urged publication of such estimates as a routine part of reporting the results of GRTs.40
Power Analysis Third, the 2 factors that largely determine power in any GRT are the ICC and the number of groups per condition. For these reasons, there is no substitute for a good estimate of the ICC for the primary endpoint, the target population, and the primary analysis planned for the trial, and it is unusual for a GRT to have adequate power with fewer than 8 to 10 groups per condition. Finally, the formula for the standard error for the intervention effect depends on the primary analysis planned for the trial, and investigators should take care to calculate that standard error, and power, based on that analysis. Chapter 9 in the Murray text1 provides formulas for many of the common analyses, and generic formulas and examples are provided in recent work conducted by Janega et al. (unpublished data, 2003). Several variations on the standard power analysis have appeared during the past 5 years. Slymen and Hovell presented a method that allows the investigator to compare sample size requirements for a GRT and an RCT based on the anticipated magnitude both of the ICC and of any contamination.41 They showed that for small groups, where contamination was likely to be substantial, GRTs were a natural choice, while for large groups, where contamination was likely to be modest, RCTs were a natural choice. Hayes and Bennett presented sample size formulas for pair-matched and pair-unmatched GRTs in terms of coefficients of variation rather than ICCs for investigators more familiar with the former than the latter.21 Murray et al. defined the design effect as it operates in a random coefficient model and presented methods for power analyses of such models.42 Kerry and Bland compared 3 methods for weighting group means in sample size calculations when those means are based on a variable number of observations; they reported that minimum variance weights were superior to uniform weights, particularly when clusters were small, and superior to cluster-size weights, particularly when the clusters were large.43 Lake et al. showed how power could be improved without increasing the type I error rate using a strategy in which sample size is reestimated after the start of recruitment using the initial data.44 This strategy has application in situations in which many groups are to be randomized and recruitment of those groups is to take place over a long period of time (e.g., some family studies). Liu et al. provided a technical discussion of sample size and power for analytic models involving differences between means, slopes, or proportions for GRTs involving repeated observations of the same groups and members45; less technical presentations are also available.1,42 Raudenbush discussed sample size in GRTs accounting for the cost of recruiting members and groups and provided formulas for optimal size with and without covariate adjustment.46
Matched Designs Raab and Butcher proposed an alternative to matching50 based on a balancing criterion calculated as a weighted sum of squared differences between the condition means on any proposed covariates. Groups would be divided into 2 sets providing a small enough value on their criterion, followed by random assignment of sets to conditions. Raab and Butcher argued that this scheme would support model-based methods because it would fulfill the conditional independence criterion. To support a randomization test, they proposed that the criterion be calculated for all possible allocations of groups to conditions, that some subset of those allocations be identified as having a small enough value on the criterion to be acceptable, and that one such allocation be chosen at random, followed by random assignment of sets to conditions.
One Group per Condition
Individuals Randomized to Receive Treatments in Groups Most recently, Varnell et al. compared analyses for these studies in simulations, varying the number of groups per condition, the magnitude of the ICC, and the number of conditions that received an intervention in small groups while fixing the intervention effect at zero.56 Analyses that ignored the ICC had an inflated type I error rate, with the magnitude of the problem dependent on the size of the ICC, the number of members per group, and the number of conditions in which participants received treatment in groups. A mixed-model regression approach with the group included as a nested random effect and degrees of freedom based on the number of groups carried the nominal type I error rate. This finding confirms that allowing participants to interact with each other in small groups does not maintain the independence of observations required for the usual RCT analyses.
Murray1 identified several analytic approaches that can provide a valid analysis for GRTs. In each, the intervention effect is defined as a function of a condition-level statistic (e.g., difference in means, rates, or slopes) and assessed against the variation in the corresponding group-level statistic. These approaches included mixed-model analysis of variance (ANOVA)/analysis of covariance (ANCOVA) for designs having only 1 or 2 time intervals, random coefficient models for designs having 3 or more time intervals, and randomization tests as an alternative to the model-based methods. Murray1 identified other approaches as invalid for GRTs because they ignored or misrepresented a source of random variation. These included (1) analyses that assessed condition variation against individual variation and ignored the group, (2) analyses that assessed condition variation against individual variation and included the group as a fixed effect, (3) analyses that assessed the condition variation against subgroup variation, and (4) analyses that assessed condition variation against the wrong type of group variation. Murray1 identified still other strategies as having limited application for GRTs. Application of fixed-effect models with post hoc correction for extra variation and limited degrees of freedom assumes that the correction is based on an appropriate ICC estimate, and in 1998 few estimates were available. Application of survey-based methods or generalized estimating equations (GEE) and the sandwich method for standard errors requires that a total of 40 or more groups be included in the study, and in 1998 most GRTs did not include 40 groups. During the past 5 years, considerable attention has been focused on analytic issues germane to GRTs, including refinements for existing methods and development of new methods. Much of this work has occurred outside the context of GRTs but has application to GRTs, and so we include it in this review.
Conditional versus Marginal Models In the marginal model, the condition coefficient is the between-person difference in the log odds of the outcome comparing the effects of the intervention and control conditions as if they had been delivered to 2 different individuals. In the conditional model, the condition coefficient is the within-person change in the log odds of the outcome comparing the effect of the intervention and control conditions as if they had been delivered to the same individual. Several recent papers have recommended conditional models for GRTs focused on change within participants (e.g., preintervention vs postintervention) and marginal models for GRTs focused on differences between participants (e.g., intervention condition vs control condition). Unfortunately, both approaches have problems in certain binary data situations; because these issues affect the remainder of our presentation, we consider them first.
Limitations of the Sandwich Estimator Used in Marginal Models
More recent work has also focused on the development and evaluation of correction procedures, though usually not in the context of GRTs. Long and Ervin69 provided additional results for 3 corrections introduced earlier by MacKinnon and White65 and reported that a jackknife estimator (a nonparametric method to estimate standard errors based on repeated subsamples) was better than the alternatives. Mancl and DeRouen reported a corrected estimator that was of nominal size even with 10 groups per condition and only 16 observations per group67; they also offered an SAS macro. Corcoran et al.70 offered an exact test, but it has only narrow application to situations in which the groups represent ordered levels of an underlying factor such as dose. Fay and Graubard reported that the sandwich estimator worked well, even in small samples, so long as the usual Wald test was evaluated not as a A similar correction provided by Kauermann and Carroll replaces the usual cutpoint in the z distribution with a cutpoint that is a function of the variance of the sandwich estimator; they demonstrated its utility even when the sample size was as small as 5.72 Pan and Wall offered a correction much like that of Fay and Graubard in the form of an approximate t or F test, with degrees of freedom defined as a function of the variance of the sandwich estimator.73 Bell and McCaffrey74 offered a correction and a Satterthwaite approach to degrees of freedom that seemed to involve less bias and a better type I error rate than the sandwich estimator or the corrected estimators recommended by Long and Ervin69 or Mancl and DeRouen.67 Preisser et al. suggested using a model-based variance estimator in GEE, rather than the sandwich estimator, as another solution.75 Unfortunately, none of these corrections appear in the standard software packages, so they are relatively unavailable to investigators who analyze GRTs. Absent an effective correction, the sandwich estimator will have an inflated type I error rate in GRTs involving fewer than 40 groups, and investigators who use this approach continue to risk overstating the significance of their findings.
Limitations of Model-Based Estimators Used in Conditional Models
Methods for Binary Data Several Bayesian approaches have also been suggested. Kleinman and Ibrahim proposed a semiparametric Bayesian approach to generalized linear mixed models but provided no simulation results to evaluate their method.82 Ten Have and Localio83 proposed an empirical Bayes method based on numerical integration and incorporated an adjustment for the standard error; their method performed better than PQL estimation given many small groups (100 groups with 2 observations per group) but not as well as PQL estimation with a smaller number of larger groups (20 groups and 100 observations per group). As such, their method may be useful in family-based GRTs but not in school-, worksite-, or community-based GRTs. Turner et al. discussed a Bayesian approach involving specification of an informative prior ICC distribution based on values taken from the literature84; as published values for ICCs become increasingly available, their approach may prove useful. A much simpler approach for binary data was reported by Hannan and Murray,79 who indicated that the familiar conditional model for Gaussian data carried the nominal type I error rate even when applied to binary data with an ICC as large as 0.05, so long as there were at least 4 groups per condition and 25 observations per group.
Methods for Survival Analysis Marginal survival models employ standard Cox regression methods to estimate the effect of the intervention and then use the sandwich estimator to obtain standard errors for the fixed effects8789; their intervention effect estimates are readily interpretable, but caution is required if the total number of groups is less than 40. Sargent described an adaptation of the Cox model to incorporate random effects using Bayesian methods but provided no simulation data on the performance of the method.90 Vaida and Xu91 described a random-effects model for proportional hazards regression similar to that of Sargent, but they also did not provide simulation results. Yau92 proposed a 3-level proportional hazards model estimated via REML. He reported results from a simulation study involving only 10 groups with just 3 members per group and 3 repeated observations for each member; censoring varied from 30% to 60%. Yaus method provided unbiased estimates of fixed effects but slightly overestimated random effects; the overestimation of random effects was reduced with even slightly increased group size. Other advantages were that the baseline hazard function did not have to be specified and estimation did not rely on numerical integration. Cai et al.88 proposed a transformation model with random effects based on numerical integration and showed that it was less biased than some of the earlier parametric models. Lui et al. proposed several methods for confidence interval estimation for rate ratios based on the betabinomial distribution93; they reported that an interval estimator based on a log transform performed best in simulations, but their smallest study included 20 groups per condition, so the small sample properties of the estimator are unknown. Bennett et al. presented a 2-stage approach to analysis of incidence rates based on person-year data,94 estimating group-specific rates (for an unadjusted analysis) or residuals (for an adjusted analysis) in a first stage without regard to intervention status; these rates or residuals were used in a second stage to estimate the intervention effect and assessed via a t statistic with degrees of freedom based on the number of groups. Simulation studies showed this approach had nominal size even with as few as 3 groups per condition and perhaps 30 members per group. While these results are encouraging, it would be of interest to see how the method performs with smaller groups.
Randomization Tests At the same time, randomization tests can have less power than model-based tests when the model is correct. To address that problem, Braun and Feng99 developed a weighted randomization test using the inverse of the total variance for each group as the weight; they showed this test to be the uniformly most powerful randomization test for Gaussian data. They also developed a locally most powerful randomization test based on a more complicated quasi-score method for non-Gaussian data. In a series of simulation studies, Braun and Feng showed that their optimal randomization test had nominal size and better power than alternative randomization tests or GEE, although it was still not as powerful as the model-based analysis when the model was specified correctly; additional research is needed to compare Braun and Fengs optimal randomization test and model-based methods under model misspecification.
Survey Methods
Latent Variable Methods and Nonlinear Models Nonlinear mixed models are a type of mixed model in which both the fixed and random effects have a nonlinear relationship with the endpoint. They differ from the more familiar generalized linear mixed models in which the fixed and random effects are linearly related to a predictor and the predictor is related to the endpoint through a nonlinear link function. Readers are referred to Davidian and Gilinian107 or Vonesh and Chinchilli108 for further information.
Interrupted Time Series
Global Tests for Multiple Endpoints
Methods for Analysis of Mediation Effects
Missing Data
Software Several SAS (http://www.sas.com/) procedures support analyses for GRTs. PROC MIXED117,118 supports models and covariance structures for Gaussian endpoints.1,119 The GLIMMIX macro118 supports parallel models and structures for non-Gaussian endpoints and can perform mixed-model logistic and Poisson regression. Some have criticized GLIMMIX because it uses pseudo-likelihood estimation, which is similar to PQL and so underestimates fixed effects and their standard errors under the circumstances noted earlier. However, because most GRTs do not fit those circumstances GLIMMIX continues to be a valid tool in most GRTs. More recently, SAS introduced PROC NLMIXED, which is a nonlinear mixed-model regression procedure.117 NLMIXED uses numerical integration for ML estimation and so is more appropriate than GLIMMIX for GRTs that involve very small groups (e.g., family studies). NLMIXED can be used with Gaussian, binomial, and Poisson distributions for mixed-model linear, logistic, and Poisson regression; users can also construct their own log-likelihood function to perform, for example, a clustered ordinal logistic regression or frailty analysis (O. Schabenberger; written communication; April 9, 2003). NLMIXED can accommodate nested designs, although the procedure will encounter computational difficulties if the number of random terms exceeds 5 or 6.120 The NLMIXED procedure does not support the within-group repeated measures structures available in MIXED and GLIMMIX; instead, NLMIXED assumes that repeated observations within a member or group are uncorrelated. MIXED and GLIMMIX support model-based and sandwich estimation for standard errors, while NLMIXED provides only model-based estimation. PROC PHREG and PROC GENMOD support sandwich estimation for standard errors and so can be applied to GRTs to perform Cox regression and logistic and ordinal logistic regression, respectively121; however, caution is required when there are fewer than 40 groups, absent a correction for the bias in the sandwich estimator. MIXOR (http://tigger.uic.edu/~hedeker/mix.html) and its related programs122124 can be used with Gaussian, binary, and Poisson data to provide mixed-model linear, logistic, and Poisson regression. These programs also allow mixed-model grouped-time survival analysis,85 mixed-model logistic or probit analysis for ordinal endpoints,125 and mixed-model logistic regression for nominal endpoints.124,126 The MlwiN program (http://multilevel.ioe.ac.uk/index.html) can be used with Gaussian, Bernoulli, binomial, multinomial, and Poisson distributions and can also fit ordinal logistic models for clustered data.127 The SUDAAN software package (http://www.rti.org/sudaan/home.cfm) (Research Triangle Institute, Research Triangle Park, NC) supports models for analysis of survey data that are often applicable to GRTs. In addition, SPSS (http://www.spss.com) has introduced a mixed-model regression program that supports several covariance structures.128 None of the programs just mentioned incorporate a correction for the underestimation bias in the sandwich estimator when the data are binary and there are few groups per condition. As indicated earlier, the work in that area seems to be converging on a solution, and this may encourage the developers to add such a correction to their procedures.
Recommendations for Trial Reporting
The purpose of this article has been to review the methodological developments from the past 5 years regarding the design and analysis of GRTs. The sheer volume of work is quite remarkable, and while every effort was made to provide a thorough review based on extensive searches of electronic databases and other sources, there are no doubt relevant papers that we did not include. Nonetheless, this review makes clear that there are valid methods that are readily available and well documented for the design and analysis of GRTs. We hope that this review will help investigators familiarize themselves with these methods and encourage them to collaborate with methodologists who can use these developments to strengthen the design and analysis of their trials. Certainly, the methods required for GRTs are not as simple as those required for RCTs, and this is unfortunate. As noted 5 years ago, however: Whenever the investigator wants to evaluate an intervention that operates at a group level, manipulates the social or physical environment, or cannot be delivered to individuals, a group-randomized trial design is the best comparative design available.1(p15) When that text appeared in 1998, it attempted to address the question of how to conduct GRTs well. Clearly the developments of the past 5 years have made it even easier to conduct GRTs well, and we simply must do a better job of taking advantage of these developments.
We wish to acknowledge helpful comments from Zideng Feng, Fred Hutchinson Cancer Research Center; Barry Graubard, National Cancer Institute; Peter Hannan, University of Minnesota; Donald Hedeker, University of Illinois, Chicago; Stephen Raudenbush, University of Michigan; Oliver Schabenberger, SAS Institute Inc; and Alexander Wagenaar, University of Minnesota.
Contributors D. M. Murray wrote the first and final drafts of the article. S. P. Varnell and J. L. Blitstein located many of the reviewed articles and edited several versions of the article. Accepted for publication September 12, 2003.
1. Murray DM. Design and Analysis of GroupRandomized Trials. New York, NY: Oxford University Press Inc; 1998. 2. Kish L. Survey Sampling. New York, NY: John Wiley & Sons Inc; 1965. 3. Zucker DM. An analysis of variance pitfall: the fixed effects analysis in a nested design. Educ Psychol Meas. 1990;50:731738.[Abstract]
4. Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol. 1978;108:100102.
5. Campbell MK, Grimshaw JM. Cluster randomised trials: time for improvement. BMJ. 1998;317:11711172.
6. Atienza AA, King AC. Community-based health intervention trials: an overview of methodological issues. Epidemiol Rev. 2002;24:7279.
7. Kerry SM, Bland JM. Analysis of a trial randomised in clusters. BMJ. 1998;316:54. 8. Kirkwood BR, Cousens SN, Victora CG, de Zoysa I. Issues in the design and interpretation of studies to evaluate the impact of community-based interventions. Trop Med Int Health. 1997;2:10221029.[Web of Science][Medline]
9. Campbell MK, Mollison J, Steen N, Grimshaw JM, Eccles M. Analysis of cluster randomized trials in primary care: a practical approach. Fam Pract. 2000;17:192196. 10. Donner A. Some aspects of the design and analysis of cluster randomization trials. Appl Stat. 1998;47:95113. 11. Carvajal SC, Baumler E, Harrist RB, Parcel GS. Multilevel models and unbiased tests for group based interventions: examples from the Safer Choices Study. Multivariate Behav Res. 2001;36:185205. 12. Kenny DA, Mannetti L, Pierro A, Livi S, Kashy DA. The statistical analysis of data from small groups. J Pers Soc Psychol. 2002;83:126137.[Web of Science][Medline]
13. Kerry SM, Bland JM. Trials which randomize practices I: how should they be analysed? Fam Pract. 1998;15:8083.
14. Bloom HS, Bos JM, Lee S-W. Using cluster random assignment to measure program impacts: statistical implications for the evaluation of education programs. Eval Rev. 1999;23:445469. 15. Altman DG. Statistics in medical journals: some recent trends. Stat Med. 2000;19:32753289.[Web of Science][Medline] 16. Klar N, Donner A. Current and future challenges in the design and analysis of cluster randomization trials. Stat Med. 2001;20:37293740.[Web of Science][Medline] 17. Feng Z, Diehr P, Peterson A, McLerran D. Selected statistical issues in group randomized trials. Annu Rev Public Health. 2001;22:167187.[Web of Science][Medline] 18. Feng Z, Thompson B. Some design issues in a community intervention trial. Control Clin Trials. 2002;23:431449.[Web of Science][Medline]
19. Bland JM. Sample size in guidelines trials. Fam Pract. 2000;17(suppl):S17S20.
20. Kerry SM, Bland JM. Trials which randomize practices II: sample size. Fam Pract. 1998;15:8487.
21. Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. Int J Epidemiol. 1999;28:319326. 22. Resnicow K, Braithwaite R, Dilorio C, Vaughan R, Cohen MI, Uhl GA. Preventing substance use in high risk youth: evaluation challenges and solutions. J Primary Prev. 2001;21:399415. 23. Zucker DM. Design and analysis of cluster randomization trials. In: Geller N, ed. Advances in Clinical Trials Biostatistics. New York, NY: Marcel Dekker Inc; 2003. 24. Murray DM. Statistical models appropriate for designs often used in group-randomized trials. Stat Med. 2001;20:13731385.[Web of Science][Medline] 25. Reed JF. Eliminating bias in randomized cluster trials with correlated binomial outcomes. Comput Methods Programs Biomed. 2000;61:119123.[Web of Science][Medline] 26. Brown CH, Liao J. Principles for designing randomized preventive trials in mental health: an emerging developmental epidemiology paradigm. Am J Community Psychol. 1999;27:673710.[Web of Science][Medline]
27. Hayes RJ, Alexander NDE, Bennett S, Cousens SN. Design and analysis issues in cluster-randomized trials of interventions against infectious diseases. Stat Methods Med Res. 2000;9:95116. 28. Loeys T, Vansteelandt S, Goetghebeur E. Accounting for correlation and compliance in cluster randomized trials. Stat Med. 2001;20:37533767.[Web of Science][Medline] 29. Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London, England: Arnold; 2000. 30. McCulloch CE, Searle SR. Generalized, Linear and Mixed Models. New York, NY: John Wiley & Sons Inc; 2001. 31. Brown H, Prescott R. Applied Mixed Models in Medicine. Chichester, England: John Wiley & Sons Inc; 1999. 32. Raudenbush SW, Bryk AS. Hierarchical Linear Models. 2nd ed. Thousand Oaks, Calif: Sage Publications; 2002. 33. Kreft I, De Leeuw J. Introducing Multilevel Modeling. London, England: Sage Publications; 1998.
34. Murray DM, Blitstein JL. Methods to reduce the impact of intraclass correlation in group-randomized trials. Eval Rev. 2003;27:79103. 35. Feng Z, Diehr P, Yasui Y, Evans B, Beresford S, Koepsell TD. Explaining community-level variance in group randomized trials. Stat Med. 1999;18:539556.[Web of Science][Medline] 36. Murray DM, Short BJ. Intraclass correlation among measures related to alcohol use by school aged adolescents: estimates, correlates, and applications in intervention studies. J Drug Educ. 1996;26:207230.[Web of Science][Medline] 37. Murray DM, Short BJ. Intraclass correlation among measures related to alcohol use by young adults: estimates, correlates and applications in intervention studies. J Stud Alcohol. 1995;56:681694.[Web of Science][Medline] 38. Murray DM, Short BJ. Intraclass correlation among measures related to tobacco use by adolescents: estimates, correlates, and applications in intervention studies. Addict Behav. 1997;22:112.[Web of Science][Medline] 39. Murray DM, Clark MH, Wagenaar AC. Intraclass correlations from a community-based alcohol prevention study: the effect of repeat observations on the same communities. J Stud Alcohol. 2000;61:881890.[Web of Science][Medline] 40. Elbourne DR, Campbell MK. Extending the CONSORT statement to cluster randomized trials: for discussion. Stat Med. 2001;20:489496.[Web of Science][Medline]
41. Slymen DJ, Hovell MF. Cluster versus individual randomization in adolescent tobacco and alcohol studies: illustrations for design decisions. Int J Epidemiol. 1997;26:765771.
42. Murray DM, Feldman HA, McGovern PG. Components of variance in a group-randomized trial analyzed via a random-coefficients model: the REACT Trial. Stat Methods Med Res. 2000;9:117133. 43. Kerry SM, Bland JM. Unequal cluster sizes for trials in English and Welsh general practice: implications for sample size calculations. Stat Med. 2001;20:377390.[Web of Science][Medline] 44. Lake S, Kaumann E, Klar N, Betensky R. Sample size re-estimation in cluster randomization trials. Stat Med. 2002;21:13371350.[Web of Science][Medline] 45. Liu A, Shih WJ, Gehan E. Sample size and power determination for clustered repeated measurements. Stat Med. 2002;21:17871801.[Web of Science][Medline] 46. Raudenbush SW. Statistical analysis and optimal design in cluster randomized trials. Psychol Methods. 1997;2:173185.[Web of Science]
47. Varnell S, Murray DM, Janega JB, and Blitstein BL. Design and analysis of group-randomized trials: a review of recent practices. Am J Public Health. 2004;94:393399. 48. Klar N, Donner A. The merits of matching in community intervention trials: a cautionary tale. Stat Med. 1997;16:17531764.[Web of Science][Medline] 49. Thompson SG. The merits of matching in community intervention trials: a cautionary tale [letter]. Stat Med. 1998;17:21472152.[Web of Science][Medline] 50. Raab GM, Butcher I. Balance in cluster randomized trials. Stat Med. 2001;20:351365.[Web of Science][Medline]
51. Varnell SP, Murray DM, Baker WL. An evaluation of analysis options for the one group per condition design: can any of the alternatives overcome the problems inherent in this design? Eval Rev. 2001;25:440453. 52. Whiting-OKeefe QE, Henke C, Simborg DW. Choosing the correct unit of analysis in medical care experiments. Med Care. 1984;22:11011114.[Web of Science][Medline] 53. Roberts C. The implications of variation in outcome between health professionals for the design and analysis of randomized controlled trials. Stat Med. 1999;18:26052615.[Web of Science][Medline] 54. Schnurr PP, Friedman MJ, Lavori PW, Hsieh FY. Design of Department of Veterans Affairs Cooperative Study No. 420: group treatment of posttraumatic stress disorder. Control Clin Trials. 2001;22:7488.[Web of Science][Medline] 55. Hoover DR. Clinical trials of behavioral interventions with heterogeneous teaching subgroup effects. Stat Med. 2002;21:13511364.[Web of Science][Medline] 56. Varnell S, Murray DM, Hannan PJ, Baker WL. Intraclass correlation at the level of the unit of intervention in a randomized clinical trial: implications for analysis. Paper presented at: Annual Meeting of the American Evaluation Association, November 710, 2001, St. Louis, Mo. 57. Harville DA. Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc. 1977;72:320338.[Web of Science]
58. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:1322. 59. Zeger SL, Liang K-Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42:121130.[Web of Science][Medline] 60. Thornquist MD, Anderson GL. Small sample properties of generalized estimating equations in group-randomized designs with Gaussian response. Paper presented at: 120th Annual Meeting of the American Public Health Association, October 812, 1992, Washington, DC. 61. Feng Z, McLerran D, Grizzle J. A comparison of statistical methods for clustered data analysis with Gaussian error. Stat Med. 1996;15:17931806.[Web of Science][Medline]
62. Murray DM, Hannan PJ, Baker WL. A Monte Carlo study of alternative responses to intraclass correlation in community trials: is it ever possible to avoid Cornfields penalties? Eval Rev. 1996;20:313337. 63. Emrich LJ, Piedmonte MR. On some small sample properties of generalized estimating equation estimates for multivariate dichotomous outcomes. J Stat Computation Simulation. 1992;41:1929. 64. Lipsitz SR, Fitzmaurice GM, Orav EJ, Laird NM. Performance of generalized estimating equations in practical situations. Biometrics. 1994;50:270278.[Web of Science][Medline] 65. MacKinnon JG, White H. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. J Econometrics. 1985;29:305325. 66. Murray DM, Hannan PJ, Wolfinger RD, Baker WL, Dwyer JH. Analysis of data from group-randomized trials with repeat observations on the same groups. Stat Med. 1998;17:15811600.[Web of Science][Medline] 67. Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001;57:126134.[Web of Science][Medline]
68. Bellamy SL, Gibberd R, Hancock L, et al. Analysis of dichotomous outcome data for community intervention studies. Stat Methods Med Res. 2000;9:135159. 69. Long JS, Ervin LH. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician. 2000;54:217224.[Web of Science] 70. Corcoran C, Ryan L, Senchaudhuri P, Mehta C, Patel N, Molenberghs G. An exact trend test for correlated binary data. Biometrics. 2001;57:941948.[Web of Science][Medline] 71. Fay M, Graubard P. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics. 2001;57:11981206.[Web of Science][Medline] 72. Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96:13871396.[Web of Science] 73. Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat Med. 2002;21:14291441.[Web of Science][Medline] 74. Bell RM, McCaffrey DF. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology. 2002;28:169181. 75. Preisser JS, Young ML, Zaccaro DJ, Wolfson M. An integrated population-averaged approach to the design, analysis and sample size determination of cluster-unit trials. Stat Med. 2003;22:12351254.[Web of Science][Medline] 76. Rodriguez G, Goldman N. An assessment of estimation procedures for multilevel models with binary responses. J R Stat Soc. 1995;158:7389. 77. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88:925.[Web of Science] 78. Ten Have TR, Kunselman A, Zharichenko E. Accommodating negative intracluster correlation with a mixed effects logistic model for bivariate binary data. J Biopharm Stat. 1998;8:131149.[Medline]
79. Hannan PJ, Murray DM. Gauss or Bernoulli? A Monte Carlo comparison of the performance of the linear mixed model and the logistic mixed model analyses in simulated community trials with a dichotomous outcome variable at the individual level. Eval Rev. 1996;20:338352. 80. Gibbons RD, Hedeker D. Random effects probit and logistic regression models for three-level data. Biometrics. 1997;53:15271537.[Web of Science][Medline] 81. Aitkin M. A general maximum likelihood analysis of variance components in generalized linear models. Biometrics. 1999;55:117128.[Web of Science][Medline] 82. Kleinman KP, Ibrahim JG. A semi-parametric Bayesian approach to generalized linear mixed models. Stat Med. 1998;17:25792596.[Web of Science][Medline] 83. Ten Have TR, Localio AR. Empirical Bayes estimation of random effects parameters in mixed effects logistic regression models. Biometrics. 1999;55:10221029.[Web of Science][Medline] 84. Turner RM, Omar RZ, Thompson SG. Bayesian methods of analysis for cluster randomized trials with binary outcome data. Stat Med. 2001;20:453472.[Web of Science][Medline]
85. Hedeker D, Siddiqui O, Hu FB. Random-effects regression analysis of correlated grouped-time survival data. Stat Methods Med Res. 2000;9:161179. 86. Ross EA, Moore D. Modeling clustered, discrete, or grouped time survival data with covariates. Biometrics. 1999;55:813819.[Web of Science][Medline] 87. Segal MR, Neuhaus JM, James IR. Dependence estimation for marginal models of multivariate survival data. Lifetime Data Analysis. 1997;3:251268.[Medline] 88. Cai T, Cheng SC, Wei LJ. Semiparametric mixed-effects models for clustered failure time data. J Am Stat Assoc. 2002;95:514522. 89. Gray RJ, Li Y. Optimal weight functions for marginal proportional hazards analysis of clustered failure time data. Lifetime Data Analysis. 2002;8:519.[Web of Science][Medline] 90. Sargent DJ. A general framework for random effects survival analysis in the Cox proportional hazards setting. Biometrics. 1998;54:14861497.[Web of Science][Medline] 91. Vaida F, Xu R. Proportional hazards model with random effects. Stat Med. 2000;19:33093324.[Web of Science][Medline] 92. Yau KK. Multilevel models for survival analysis with random effects. Biometrics. 2001;57:96102.[Web of Science][Medline] 93. Lui K-J, Mayer JA, Eckhardt L. Confidence intervals for the risk ratio under cluster sampling based on the beta-binomial model. Stat Med. 2000;19:29332942.[Web of Science][Medline]
94. Bennett S, Parpia T, Hayes R, Cousens S. Methods for the analysis of incidence rates in cluster randomized trials. Int J Epidemiol. 2002;31:839846. 95. Gail MH, Byar D, Pechacek TF, Corle D. Aspects of statistical design for the Community Intervention Trial for Smoking Cessation (COMMIT). Control Clin Trials. 1992;13:621.[Web of Science][Medline]
96. COMMIT Research Group. Community Intervention Trial for Smoking Cessation (COMMIT): I. Cohort results from a four-year community intervention. Am J Public Health. 1995;85:183192.
97. COMMIT Research Group. Community Intervention Trial for Smoking Cessation (COMMIT): II. Changes in adult cigarette smoking prevalence. Am J Public Health. 1995;85:193200. 98. Gail MH, Mark SD, Carroll RJ, Green SB, Pee D. On design considerations and randomization-based inference for community intervention trials. Stat Med. 1996;15:10691092.[Web of Science][Medline] 99. Braun T, Feng Z. Optimal permutation tests for the analysis of group randomized trials. J Am Stat Assoc. 2001;96:14241432.[Web of Science] 100. LaVange LM, Koch GG, Schwartz TA. Applying sample survey methods to clinical trials data. Stat Med. 2001;20:26092623.[Web of Science][Medline] 101. Korn EL, Graubard BI. Analysis of Health Surveys. New York, NY: John Wiley & Sons Inc; 1999. 102. Muthen BO. Beyond SEM: general latent variable modeling. Behaviormetrika. 2002;29:81117. 103. Schulenberg J, Maggs JL. Moving targets: modeling developmental trajectories of adolescent alcohol misuse, individual and peer risk factors, and intervention effects. Appl Dev Sci. 2001;5:237253. 104. Muthen BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: a latent variable framework for analysis and power estimation. Psychol Methods. 1997;2:371402.[Web of Science] 105. Curran PJ, Muthen BO. The application of latent curve analysis to testing developmental theories in intervention research. Am J Community Psychol. 1999;27:567595.[Web of Science][Medline]
106. Hser Y-I, Shen H, Chuang C-P, Messer SC, Anglin MD. Analytic approaches for assessing long-term treatment effects. Eval Rev. 2001;25:233262. 107. Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. London, England: Chapman & Hall; 1995. 108. Vonesh EF, Chinchilli VM. Linear and Nonlinear Models for the Analysis of Repeated Measurements. New York, NY: Marcel Dekker; 1997.
109. Gruenewald PJ. Analysis approaches to community evaluation. Eval Rev. 1997;21:209230. 110. Biglan A, Ary D, Wagenaar AC. The value of interrupted time-series experiments for community intervention research. Prev Sci. 2000;1:3149.[Medline]
111. Krull J, MacKinnon DP. Multilevel mediation modeling in group-based intervention studies. Eval Rev. 1999;23:418444. 112. MacKinnon DP, Taborga MP, Morgan-Lopez AA. Mediation designs for tobacco prevention research. Drug Alcohol Depend. 2002;68:S69S83. 113. Yi GY, Cook RJ. Marginal methods for incomplete longitudinal data arising in clusters. J Am Stat Assoc. 2002;97:10711080.[Web of Science] 114. Hunsberger S, Murray DM, Davis CE, Fabsitz R. Imputation strategies for missing data in a school based multicenter study: the Pathways study. Stat Med. 2001;20:305316.[Web of Science][Medline] 115. Zhou Z-H, Perkins AJ, Hui SL. Comparisons of software packages for generalized linear multilevel models. Am Statistician. 1999;53:282290. 116. Bryk AS, Raudenbush SW. Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park, Calif: Sage Publications; 1992. 117. SAS/STAT Users Guide, Version 8. Cary, NC: SAS Institute Inc; 1999. 118. Littell RC, Milliken GA, Stroup WW, Wolfinger RD. SAS System for MIXED Models. Cary, NC: SAS Institute Inc; 1996. 119. Singer J. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. J Educ Behav Stat. 1998;24:322354. 120. Wolfinger RD. Fitting nonlinear mixed models with the new NLMIXED procedure. Paper presented at: 24th Annual SAS Users Group International Conference, April 1999, Miami, Fla. 121. SAS/STAT Software: Changes and Enhancements, Release 8.1. Cary, NC: SAS Institute Inc; 2000. 122. Hedeker D, Gibbons RD. MIXOR: a computer program for mixed-effects ordinal regression analysis. Comput Methods Programs Biomed. 1996;49:157176.[Web of Science][Medline] 123. Hedeker D, Gibbons RD. MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors. Comput Methods Programs Biomed. 1996;49:229252.[Web of Science][Medline] 124. Hedeker D. MIXNO: a computer program for mixed-effects nominal logistic regression. J Stat Software. 1999;4(5):192. 125. Hedeker D, Gibbons RD. A random-effects ordinal regression model for multilevel analysis. Biometrics. 1994;50:933944.[Web of Science][Medline] 126. Hedeker D. A mixed-effects multinomial logistic regression model. Stat Med. 2003;22:14331446.[Web of Science][Medline] 127. Goldstein H, Browne W, Rasbash J. Multilevel modelling of medical data. Stat Med. 2002;21:32913315.[Web of Science][Medline] 128. MIXED. In: SPSS 11.0 Syntax Reference Guide. Chicago, Ill: SPSS Inc; 2002:136151. 129. Baskerville NB, Hogg W, Lemelin J. The effect of cluster randomization on sample size in prevention research. J Fam Pract. 2001;50:242246. 130. Campbell MK, Mollison J, Grimshaw JM. Cluster trials in implementation research: estimation of intracluster correlation coefficients and sample size. Stat Med. 2001;20:391399.[Web of Science][Medline] 131. Piaggio G, Carroli G, Villar J, et al. Methodological considerations on the design and analysis of an equivalence stratified cluster randomization trial. Stat Med. 2001;20:401416.[Web of Science][Medline] 132. Smeeth L, Ng ES-W. Intraclass correlation coefficients for cluster randomized trials in primary care: data from the MRC trial of the assessment and management of older people in the community. Control Clin Trials. 2002;23:409421.[Web of Science][Medline]
133. Gulliford MC, Ukoumunne OC, Chinn S. Components of variance and intraclass correlations for the design of community-based surveys and intervention studies. Am J Epidemiol. 1999;149:876883. 134. Scheier LM, Griffin KW, Doyle MM, Botvin GJ. Estimates of intragroup dependence for drug use and skill measures in school-based drug abuse prevention trials: an empirical study of three independent samples. Health Educ Behav. 2002;29:83101.
135. Murray DM, Phillips GA, Birnbaum AS, Lytle LA. Intraclass correlation for measures from a school-based nutrition intervention study: estimates, correlates and applications. Health Educ Behav. 2001;28:666679. 136. Murray DM, Alfano CM, Zbikowski SM, Padgett LS, Robinson LA, Klesges R. Intraclass correlation among measures related to cigarette use by adolescents: estimates from an urban and largely African American cohort. Addict Behav. 2002;27:509527.[Web of Science][Medline]
137. Lazovich D, Murray DM, Brosseau LM, Parker DL, Milton FT, Dugan SK. Sample size considerations for studies of intervention efficacy in the occupational setting. Ann Occup Hyg. 2002;46:219227. 138. Martinson BC, Murray DM, Jeffery RW, Hennrikus DJ. Intraclass correlation for measures from a worksite health promotion study: estimates, correlates and applications. Am J Health Promotion. 1999;13:347357.[Web of Science][Medline] This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||