Objectives. To examine the role that bots play in spreading vaccine information on Twitter by measuring exposure and engagement among active users from the United States.
Methods. We sampled 53 188 US Twitter users and examined who they follow and retweet across 21 million vaccine-related tweets (January 12, 2017–December 3, 2019). Our analyses compared bots to human-operated accounts and vaccine-critical tweets to other vaccine-related tweets.
Results. The median number of potential exposures to vaccine-related tweets per user was 757 (interquartile range [IQR] = 168–4435), of which 27 (IQR = 6–169) were vaccine critical, and 0 (IQR = 0–12) originated from bots. We found that 36.7% of users retweeted vaccine-related content, 4.5% retweeted vaccine-critical content, and 2.1% retweeted vaccine content from bots. Compared with other users, the 5.8% for whom vaccine-critical tweets made up most exposures more often retweeted vaccine content (62.9%; odds ratio [OR] = 2.9; 95% confidence interval [CI] = 2.7, 3.1), vaccine-critical content (35.0%; OR = 19.0; 95% CI = 17.3, 20.9), and bots (8.8%; OR = 5.4; 95% CI = 4.7, 6.3).
Conclusions. A small proportion of vaccine-critical information that reaches active US Twitter users comes from bots.
Vaccine misinformation—false information not supported by evidence—is believed to be common on social media,1 but much less is known about whether it is commonly encountered and who encounters it. Not all vaccine-critical content posted on social media is misinformation, but misinformation is common in vaccine-critical content.2 Misinformation can undermine confidence in vaccination and encourage hesitancy and refusal,3 which then may influence the number and severity of infectious disease outbreaks.4,5 The potential for misinformation to spread via social media platforms is a pressing question for governments and global agencies.
Information epidemics and the potential impact of poor-quality health information online have been discussed for more than 20 years.6 To understand the effect of misinformation on health behaviors and outcomes, we need to go beyond characterizing misinformation and measuring how quickly it spreads7,8 to measure the composition of what people are exposed to and engage with online. Information exposure is related to the concept of exposure in individual psychology studies on misinformation, and measures of information engagement are related to the concepts of salience of misinformation and how people choose to express their vaccine attitudes online.9 Because of its openness, Twitter is a convenient platform for estimating information exposure and engagement, but it has limitations. Potential exposure can be measured by observing social network structure, and engagement can be partially measured by observing how users pass on misinformation as retweets.
The Twitter information ecosystem is a complex mix of human and nonhuman actors posting or engaging with information for a range of purposes. Software agents (bots) that post on social media are an important type of nonhuman actor. Although there is evidence of their involvement in vaccination discourse and more broadly in public health on social media platforms,10–12 estimates of the size of their presence and their potential impact vary considerably. On Twitter, bots are accounts that are operated automatically to post, retweet, or reply and may vary in sophistication from simply reposting links to certain (often malicious) Web pages to more sophisticated masquerading of humans aiming to alter the discourse of a topic. A 2017 study estimated that between 9% and 15% of Twitter accounts are bots.13
A study that characterized vaccine-related tweets posted by trolls and bots on Twitter suggested that trolls and bots affect vaccine discourse10 but did not measure whether people ever saw or engaged with those tweets. Related research examining the potential effect of bots and fake news in politics suggested that bots play a minor role in potential influence, with humans more often responsible for spreading misinformation than bots.8,14
Despite the number of studies characterizing vaccine-related content on social media, none have provided reliable estimates of how often human social media users see or engage with bots on the topic. We sought to measure exposure to and engagement with vaccine information among active Twitter users in the United States and examine the role that bots might play in spreading vaccine-critical information on the platform.
We collected all tweets matching a set of vaccine-related keywords and posted by any Twitter account between January 12, 2017, and December 3, 2019. We labeled tweets as vaccine critical or otherwise by training a machine-learning classifier, and we identified bots among the accounts posting vaccine-related tweets. Although tweets expressing a negative opinion of vaccination are not necessarily misinformation, much of the vaccine-critical content available online is either inaccurate or makes claims unsupported by evidence.2
In parallel, we monitored a sample of active, human-operated Twitter accounts from the United States (users) to track potential exposure and engagement with vaccine-related content. Information exposure is challenging to measure at scale, so we used information about the accounts the users follow as a proxy. Similarly, engagement is multifaceted and could be defined by views, interactions, replies, retweets, quoting, or other actions, and we measured retweets under the assumption that they most closely relate to active engagement.
First, we sampled a set of Twitter users from the United States, requiring that these users be well established and active. We sampled accounts of people who had recently posted any tweet and used a heuristic based on previous studies examining users over time to target active and established users via checks on the number of followers, rate of posting, and proportion of retweets (Appendix, section 2 [available as a supplement to the online version of this article at http://www.ajph.org]). A gazetteer, Nominatim (Appendix, section 3), then parsed accounts we judged to be active and well-established human users to infer a home location, and we included those that were in the United States in the analysis.
We captured vaccine-related tweets posted by any public Twitter account via continuous keyword filtered requests to Twitter servers. Keywords included all reasonable synonyms and variants for words related to vaccines, vaccination, and immunization (Appendix, section 1). We have previously experimented with coverage of this approach and are confident that it covers all public tweets and retweets that match these keywords relative to the current best practice for Twitter surveillance in public health applications.15
We identified bots using Botometer,16 an established and validated method for identifying Twitter accounts likely to be bots (Appendix, section 4.1). Because of daily limits on the service, we collected these intermittently after accounts first posted a vaccine-related tweet during the study period. Bot scores vary between 0 and 1, where scores closer to 1 indicate a higher likelihood that a user is a bot. The typical approach used to identify bots among a population of Twitter accounts is to use a simple threshold, where any account with a score of 0.5 or higher is labeled as a bot.8,13,17
We used a supervised machine-learning method to identify vaccine-critical tweets. We created training data with help from Amazon Mechanical Turk workers, who were asked to label tweets based on whether they were vaccine critical. We sampled 10 000 vaccine-related tweets from 10 000 distinct accounts from the complete set of vaccine-related tweets, and we used labels from multiple workers to train a classifier (Appendix, section 4.2). We then applied the best-performing classifier to all vaccine-related tweets to label them as vaccine critical or otherwise.
We measured exposure based on the Twitter accounts that users followed. We collected lists of such accounts once per user after we first sampled each user. We counted any vaccine-related tweet or retweet posted by an account that a user was following as a potential exposure. We counted any vaccine-related tweet posted during the study period as a potential exposure if the user followed the account at the time the information was collected.
We measured engagement by identifying vaccine-related tweets that were retweeted by users. We looked for the users’ identifiers in the set of 21 million vaccine-related tweets and retweets (Appendix, section 1).
In our primary analysis, we focused on descriptive characterizations of the frequency and distribution of exposures and engagements for bots relative to human-operated accounts and for vaccine-critical relative to other vaccine-related tweets. Because we did not attempt to infer the demographics of the randomly sampled users, we were unable to measure or adjust for demographic differences.
We conducted a post hoc analysis of the set of users for whom potential exposures to vaccine-critical tweets were at least half of the total number of potential exposures to vaccine-related tweets. For this subgroup of users, we compared engagement with bots and vaccine-critical tweets against their counterpart users for whom vaccine-critical tweets made up less than half of their exposures. Differences are reported as unadjusted odds ratios (ORs) with 95% confidence intervals (CIs).
The study included 53 188 Twitter users in the United States, who we sampled independently from whether they were exposed to or shared vaccine-related content (Figure 1). These users were distributed across the United States; the most common user locations were California (12.3%), New York (9.2%), and Texas (9.1%; Appendix, section 3). Of the 5 124 906 scored accounts tweeting or retweeting about vaccines during the study period, 2 121 315 accounts were followed by 1 or more of the 53 188 users.

FIGURE 1— Flow of Vaccine-Related Tweets and the Users Who Posted Them Through the Analysis: United States, 2017–2019
Note. A random sample of 53 188 users from the United States was examined for their potential interaction with more than 20 million vaccine-related tweets posted by human-operated and bot accounts. For more information on how the data were constructed and what the relationships were, see the Appendix (available as a supplement to this article at https://www.ajph.org).
In terms of the frequency of potential exposures, we found that users were potentially exposed to a median of 757 (interquartile range [IQR] = 168–4435) vaccine-related tweets, a median of 27 vaccine-critical tweets (IQR = 6–169), and a median of 0 vaccine-related tweets from bots (IQR = 0–12). The results indicate that for most users, exposure to vaccine-critical content was relatively infrequent and that exposure to bots was extremely infrequent (Figure 2).

FIGURE 2— Per User Percentage of (a) Potential Exposures to Bots vs Human-Operated Accounts, and (b) Vaccine-Critical Tweets vs Other Vaccine-Related Tweets: United States, 2017–2019
Note. Potential exposure to antivaccine tweets and bots was relatively uncommon and unevenly distributed among users. Users are in descending order by proportional exposure to bots (orange, part a) or vaccine-critical tweets (orange, part b) and in descending order by proportional exposure to human-operated accounts (cyan, part a) or other vaccine-related content (cyan, part b).
Exposure to bots was rare and unevenly distributed across users (Figure 2); 42.0% of users may have been exposed at least once to a vaccine-related tweet originally posted by a bot, because either they followed a bot account that posted a vaccine-related tweet or an account they followed retweeted a vaccine-related tweet posted by a bot account. However, posts from bots made up a small percentage of vaccine-related exposures; the median percentage of exposures originating from bots was 0.0% (IQR = 0.0%–0.5%). Bot accounts were responsible for at least half of the potential exposures to vaccine-related tweets for less than 0.06% of users.
Exposure to vaccine-critical tweets was also relatively rare and unevenly distributed across users (Figure 2). As a proportion of exposures, vaccine-critical tweets made up a median of 3.2% (IQR = 1.4%–9.2%) of all vaccine-related exposures. Vaccine-critical tweets made up at least half of vaccine-related exposures for 5.8% of users. The results indicate that although most users may have seen a vaccine-critical tweet, only 1 in 20 users were more often exposed to vaccine-critical content than other vaccine-related content.
When counting the retweets of any vaccine-related tweets and the posting of any original vaccine-related tweets together, we found that the median number of times a user posted or retweeted about vaccines was 0 (IQR = 0–2) and that 36.7% of users posted or retweeted vaccine-related content at least once during the period. Few users actively engaged with vaccine-related tweets; 1.9% of users engaged with vaccine-related tweets at least once per month on average.
Retweeting bots was uncommon. Just 2.1% of users retweeted a bot at least once during the study period, compared with 27.1% of users who retweeted vaccine-related tweets from human-operated accounts at least once (Figure 3).

FIGURE 3— Per User Percentage of (a) Retweets of Bots vs Human-Operated Accounts, and (b) Retweets of Vaccine-Critical Tweets vs Other Vaccine-Related Tweets: United States, 2017–2019
Note. Engagement with antivaccine tweets and bots was rare and unevenly distributed among users. Users are in descending order by per user percentage engagement with bots (orange, part a) or antivaccine tweets (orange, part b) and descending order by per user percentage engagement with human-operated accounts (cyan, part a) or other vaccine-related content (cyan, part b).
Retweeting vaccine-critical content was relatively uncommon. Just 4.5% of users retweeted a vaccine-critical tweet at least once, compared with 26.2% of users who retweeted other vaccine-related tweets at least once (Figure 3).
Engagement with bots and vaccine-critical tweets was unevenly distributed across users (Figure 3). For 2.6% of users, vaccine-critical tweets made up at least half of their vaccine-related retweets during the study period. Bots made up at least half of the set of vaccine-related retweets for just 0.4% of users.
We further analyzed the 5.8% of users for whom at least half of their potential exposures to vaccine-related content were vaccine-critical tweets. The median potential exposure count among the 3086 users was 30 709 (IQR = 8795–65 091) compared with 750 (IQR = 195–3698) for other users, indicating that users in this subgroup were more often exposed to any type of vaccine-related tweets than their counterparts. The median percentage of exposures to bots among the 3086 users was 6.2% (IQR = 2.2%–9.1%) compared with 0.0% (IQR = 0.0%–0.4%) among other users, indicating that bots made up a greater proportion of what these users may have seen.
Users from this subgroup were more likely to engage with vaccine-related posts in general, suggesting that they were more engaged with vaccines and vaccinations as a topic. In this subgroup, 62.9% (1940 of 3086) retweeted vaccine-related content at least once in the study period compared with 36.9% (17 553 of 47 513) of other users (OR = 2.9; 95% CI = 2.7, 3.1). The median number of posts or retweets among the subpopulation was 2 (IQR = 0–6), compared with 0 (IQR = 0–2) from other users.
Users from this subgroup were also more likely to retweet bots and vaccine-critical content than were other users. The percentage of users from this subgroup who retweeted bots at least once was 8.8% (271 of 3081) compared with 1.7% (825 of 47 513) of other users (OR = 5.4; 95% CI = 4.7, 6.3). The percentage of users from this subgroup who retweeted vaccine-critical tweets at least once was 35.0% (1081 of 3086) compared with 2.8% (1310 of 47 513) of other users (OR = 19.0; 95% CI = 17.3, 20.9).
These results show that 5.8% of Twitter users in the United States are embedded in communities where exposure to vaccine-critical content is common. These users differ from other Twitter users in the United States in that they tend to engage with the topic more often and are more likely to share vaccine-critical content. Although they are also more likely to share content from bots than other users, bots still accounted for a small proportion of what they read or shared.
Twitter users in the United States were frequently exposed to information about vaccines between January 12, 2017, and December 3, 2019. More than a third of users also engaged in discussion about the topic by posting or retweeting vaccine information, but this engagement was relatively infrequent for most users. Engagement with any vaccine-related tweets, vaccine-critical tweets, and bots was higher in the 5.8% of users who were embedded in communities where vaccine-critical content was common. The overwhelming majority of the vaccine-related content seen by typical users in the United States is generated by human-operated accounts, not bots. The results show that bots play little to no role in shaping vaccine discourse among Twitter users in the United States.
Consistent with other literature in the area, we found that a small proportion of Twitter users were embedded in communities where vaccine-critical content was shared more than other vaccine content. Compared with other users, these users were more likely to have posted or retweeted about vaccines, and more of the vaccine-related tweets they posted were vaccine critical. These results indicate that engagement with vaccine-critical information is concentrated in certain communities. This is consistent with the findings of studies examining community structure and information exposure in human papillomavirus vaccines on Twitter,18–20 studies of news media coverage of vaccinations,21 and studies on exposure to and engagement with political fake news.14 Although this study is not directly comparable with studies that characterize vaccine-related posts from bots and trolls,10 our results suggest that conclusions drawn about the importance of bots in shaping the discourse on social media may be overstated.
We found that Twitter users in the United States rarely share vaccine-related content posted by bots. A 2017 study examined follower relationships between human users and bots on Twitter, estimating that between 9% and 15% of accounts are bots, that human users mostly form connections with other human users, and that there is little reciprocity (humans following bot accounts that follow them).13 Although we measured engagement in a more direct way, our results are generally consistent with these findings. Another study examining the spread of low-credibility content suggests that human users retweet articles posted by bots almost as much as they retweet other humans, although it appears that what drives amplification is the volume of retweets the content has regardless of the provenance of those retweets.17 This highlights the importance of estimating exposure and engagement in populations of information consumers rather than speculating about impact by counting posts.
The tool we used to detect bot accounts may be imperfect, and some users may have been misclassified. However, we used a standard threshold common to previous studies that show the tool rarely misclassifies accounts.13,16,17 Some bots may have been deleted or suspended after posting about vaccines and before we could capture their bot score. Of the 5.28 million accounts posting about vaccines, we did not score 0.1 million accounts (2.4%) because they became unavailable in the period between identifying the account and checking its score. It is possible that a greater proportion of those accounts were bots compared with the accounts that were included in the analysis. Given the number, this difference could not have affected the conclusions.
We did not investigate the full complexity of the information landscape on Twitter. For example, we cannot draw any conclusions about trolls—human-operated Twitter accounts that use a range of approaches to gain followers and may post misinformation. New studies would benefit from measuring the potential effect of trolls via exposure and engagement with trolls in a robust sample of human users.
Our method for detecting vaccine-critical tweets had a high accuracy but was imperfect, which means that we likely misclassified a small proportion of posts. However, because the classifier was designed to maximize recall over precision (Appendix), we were more likely to overestimate than underestimate the number of vaccine-critical tweets.
Measures of potential exposure are approximations based on the structure of the follower network and do not capture changes to the algorithms that Twitter uses to deliver posts to users; therefore, including advertising and recent changes that present tweets from accounts that users do not follow may make potential exposure estimates less reliable. Estimates of potential exposure may have also been affected by an inability to continually update information about who users follow on Twitter. It was only feasible to collect information about who the users followed after they were identified, and users are likely to have followed or unfollowed other users during the study period. However, users are still much more likely to be exposed to tweets of the accounts they follow, so the measure of potential exposure is likely to be a reasonable proxy for information consumption.
Our results have implications for public health practice and can be used to inform approaches for addressing vaccine misinformation.7,22 Two proposed approaches are (1) tools to help social media platforms identify misinformation (a precursor to removing, hiding, or algorithmically reducing exposure), and (2) interventions designed to empower users to critically appraise the information they see.7 Critical appraisal skills necessary to distinguish between credible information and misinformation vary,23 and tools that could be used to support the critical consumption of vaccine-related information might offer insights into the techniques and topics used in antivaccine arguments.24,25 Our results suggest that allocating resources to eliminating bots may be less effective than providing tools to improve media literacy and developing personalized communications interventions targeted at communities where vaccine-critical content is most common.26 Strategies that focus on limiting the impact of influential accounts spreading misinformation are warranted—an approach aligned with how public health organizations make decisions about how and when to engage with vaccine-critical content.27
Our analysis also has implications for research practice and reporting. Beyond counting and characterizing the content posted on social media platforms, it is important to consider that not all posts have the same effect. Posts vary in terms of reach and engagement, and phenomena such as echo chambers mean that certain content is shared and consumed mostly in specific communities of often like-minded people. Measuring and separating out the mechanisms of homophily, contagion, and external drivers of network dynamics require studies that involve social media users as participants.9,28,29
How those messages influence vaccination uptake requires them to be considered in the broader context of how vaccination decisions are made. Although experimental studies of the direct impact of vaccination messages online show a capacity to change beliefs and intentions temporarily,30 studies that involve people who choose not to vaccinate their children indicate that core beliefs about health and parenting experiences in the health care system are central.31,32 Given that lack of vaccination tends to be clustered, it is likely that social network structures play a role in the process.
Further studies are needed to better understand the gaps between what can be observed about people online and the decisions they make about vaccination offline. Despite some early examples across several social media platforms,33–35 social media research rarely connects measures of information expression or exposure from social media data to individual attitudes and behaviors measured by surveys or using medical records. We recommend further studies that can reconcile the link between online and offline behavior to improve the translation of social media research.
Active Twitter users in the United States are frequently exposed to vaccine-related content, but most users never or infrequently engage. For nearly all users, bots are responsible for a small proportion of the vaccine-related content users see, and engagement with bots is negligible. Exposure to and engagement with vaccine-critical content tend to be most heavily concentrated in a relatively small subgroup of users who are more engaged with the topic overall. Researchers studying health information consumption should consider measuring exposure and sharing in representative populations to better understand the potential effect of what is being posted. Rather than focusing efforts on bots, social media platforms, policymakers, and public health agencies should continue to focus on the known factors influencing vaccination-related behaviors.
See also Chou and Gaysynsky, p.
ACKNOWLEDGMENTS
This work was funded by the National Health and Medical Research Council, Australia (grant APP1128968).
CONFLICTS OF INTEREST
The authors have no conflicts of interest to declare.
HUMAN PARTICIPANT PROTECTION
Human ethics approval for the research was granted by Macquarie University (52019614312780).