Privacy Protection Versus Cluster Detection in Spatial Epidemiology
Karen L. Olson, PhD,
Shaun J. Grannis, MD, MS and
Kenneth D. Mandl, MD, MPH
Karen L. Olson and Kenneth D. Mandl are with the Childrens Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Childrens Hospital Boston, Boston, Mass; the Division of Emergency Medicine, Childrens Hospital Boston; and the Department of Pediatrics, Harvard Medical School, Boston. Shaun J. Grannis is with Regenstrief Institute Inc and the Indiana University School of Medicine, Indianapolis.
FIGURE 1—Simulated spatial cluster analyzed through the use of (a) exact locations or (b) zip code centroids.
Note. Ten simulated cluster points were inserted into authentic emergency department data. The simulated points (small dark circles) were randomly scattered within a 1-km circle (solid line) and fell into 2 zip codes of 5 points each. The simulated cluster was located along a circle (dotted line) with a radius of 5 km that was centered at the study hospital. For this figure, points representing patient addresses were moved random short distances from their true locations. Points analyzed as exact latitude and longitude coordinates are shown in part a. The cluster identified by SaTScan contained the 10 simulated points and 4 additional points (small open circles) from the hospital data. For the data shown in part b, points were analyzed as zip code centroids, but exact locations are pictured. The cluster identified by SaTScan contained the 10 simulated points and 18 additional points (small open circles) from the hospital data.
FIGURE 2—Significant SaTScan clusters that contained at least half of the simulated cluster points, by type of administrative region and radius size of the simulated cluster: zip code area, 0.5-km radius (a); zip code area, 1-km radius (b); zip code area, 2-km radius (c); zip code area, 3-km radius (d); census tract, 0.5-km radius (e); and census tract, 1-km radius (f).
Note. Pairs of bars compare clusters identified by SaTScan when points were analyzed at two levels of address precision, as either exact x-y coordinates or as centroids of administrative regions (zip codes or census tracts). The height of the bars indicates the number of significant clusters that contained at least half (510) of the simulated points that were inserted into the data. Bands within the bars indicate the number of additional points in the clusters that came from the background emergency department visits. Number of regions refers to the number of administrative regions into which the simulated cluster points fell.