Policies to promote public health and welfare often fail or worsen the problems they are intended to solve. Evidence-based learning should prevent such policy resistance, but learning in complex systems is often weak and slow. Complexity hinders our ability to discover the delayed and distal impacts of interventions, generating unintended “side effects.” Yet learning often fails even when strong evidence is available: common mental models lead to erroneous but self-confirming inferences, allowing harmful beliefs and behaviors to persist and undermining implementation of beneficial policies.

Here I show how systems thinking and simulation modeling can help expand the boundaries of our mental models, enhance our ability to generate and learn from evidence, and catalyze effective change in public health and beyond.

The United States spends more on health care than any other nation (15.3% of gross domestic product [GDP] in 2003, up from 5.1% in 1960).1,2 Yet the return on this huge investment is discouraging: the United States ranks 33rd in life expectancy and 35th in infant mortality.2 More than 40 million have no health insurance. Minorities and the poor have significantly lower life expectancy than others.3 Nearly two thirds of US adults are overweight, and almost one third are obese.4 Diabetes and cardiovascular disease are rampant. The number of unhealthy days Americans experience is growing.5 Preventable medical errors kill tens of thousands each year.6 From Staphylococcus aureus to malaria to HIV, morbidity and mortality from drug-resistant pathogens grows.7 Most disturbing, many of these afflictions are the unintended consequences of the extraordinary prosperity and technical progress that enabled us to treat disease and decrease daily toil so successfully over the past century.

Health care is not unique. Thoughtful leaders throughout society increasingly suspect that the policies we implement to address difficult challenges have not only failed to solve the persistent problems we face, but are in fact causing them. All too often, well-intentioned programs create unanticipated “side effects.” The result is policy resistance, the tendency for interventions to be defeated by the system’s response to the intervention itself. From overuse of antibiotics that spread resistant pathogens, to the obesity caused by the sedentary lifestyles and cheap calories our prosperity affords, our best efforts to solve problems often make them worse (box next page).

Examples of Policy Resistance

Road building programs designed to reduce congestion have increased traffic, delays, and pollution.9

Low tar and nicotine cigarettes actually increase intake of carcinogens, carbon monoxide, etc., as smokers compensate for the low nicotine content by smoking more cigarettes per day, by taking longer, more frequent drags, and by holding the smoke in their lungs longer.14

Health plan policies “limiting what drugs can be prescribed—intended to prevent the unnecessary use of expensive drugs—are having the unintended effect of raising medical costs.”67

Antilock brakes and other automotive safety devices cause some people to drive more aggressively, partially offsetting their benefits.68

The war on drugs, focusing on interdiction and supply disruption, has had only a small impact on narcotics trafficking. Drug use in America and elsewhere remains high.69

Forest fire suppression causes greater tree density and fuel accumulation, leading to larger, hotter, and more dangerous fires, often consuming trees that previously survived smaller fires unharmed.70

Flood control efforts, such as levee and dam construction, have led to more severe floods by preventing the natural dissipation of excess water in flood plains. The cost of flood damage has increased as flood plains were populated in the belief they were safe.9

Antibiotics have stimulated the evolution of drug-resistant pathogens, including multidrug-resistant strains of tuberculosis, Staphylococcus aureus, and sexually transmitted diseases.7

Pesticides and herbicides have stimulated the evolution of resistant pests, killed off natural predators, and accumulated up the food chain to poison fish, birds, and, in some cases, humans.71

Highly active antiretroviral treatment has dramatically reduced mortality among those living with HIV, but has increased risky behaviors, including unprotected sex and substance abuse, among youth and other groups, causing a rebound in incidence while multiply-resistant strains of HIV proliferate.72

Despite dramatic gains in income per capita and widespread use of labor-saving technology, Americans have less leisure today than 50 years ago and are no happier.28

Policy resistance arises from a narrow, reductionist worldview. We have been trained to view our situation as the result of forces outside ourselves, forces largely unpredictable and uncontrollable. Consider the “unanticipated events” and “side effects” so often invoked to explain policy failure. Political leaders blame recession on corporate fraud or terrorism. Managers blame bankruptcy on events outside their organizations and (they want us to believe) outside their control. But there are no side effects—just effects. Those we expected or that prove beneficial we call the main effects and claim credit. Those that undercut our policies and cause harm we claim to be side effects, hoping to excuse the failure of our intervention. “Side effects” are not a feature of reality, but a sign that the boundaries of our mental models are too narrow, our time horizons too short.

For many, the solution is obvious: the continued application of the scientific method. The diligent adherence to scientific method, in this view, is responsible for the great advances of medicine and public health, from the Broad Street pump incident, where John Snow proved that cholera was a water-borne disease, to the latest double-blind prospective randomized clinical trial, and is the most reliable way to generate the evidence needed to improve health policy. There are, however, three fundamental impediments to this goal: the complexity problem, learning failures, and the implementation challenge.

I discuss these challenges to learning from evidence in complex settings, showing how policy resistance arises from the mismatch between the complexity of the systems we have created and our capacity to understand them. I describe methods for systems thinking and formal modeling that have proven to be useful, focusing on the field of system dynamics.8,9 Readers interested in learning more about system dynamics and successful applications in health policy and other domains should refer to Homer and Hirsch10 and Jones et al.11 (in this issue of the Journal) and the growing scholarly and practitioner literature.9,1216

Generating reliable evidence through scientific method requires the ability to conduct controlled experiments, discriminate among rival hypotheses, and replicate results. But the more complex the phenomenon, the more difficult are these tasks. Medical interventions and health policies are embedded in intricate networks of physical, biological, ecological, technical, economic, social, political, and other relationships. Experiments in complex human systems are often unethical or simply infeasible (we cannot release smallpox to test policies to thwart bioterrorists). Replication is difficult or impossible (we have only one climate and cannot compare a high–greenhouse gas [GHG] future to a low one). Decisions taken in one part of the system ripple out across geographic and disciplinary boundaries. Long time delays mean we never experience the full consequences of our actions.17 Follow-up studies must be carried out over decades or lifetimes, while at the same time changing conditions may render the results irrelevant. Complexity hinders the generation of evidence.

Learning often fails even when reliable evidence is available. More than 2 and one-half centuries passed from the first demonstration that citrus fruits prevent scurvy until citrus use was mandated in the British merchant marine, despite the importance of the problem and unambiguous evidence supplied by controlled experiments.18 Some argue that today we are smarter and learn faster. Yet adoption of medical treatments varies widely across regions, socioeconomic strata, and nations, indicating either overuse by some or underuse by others—despite access to the same evidence on risks and benefits.1920 Although economic theory suggests market forces, publications, benchmarking, training, and imitation should cause performance to converge to optimal levels, many studies document large, persistent differences in performance across organizations.21 Consider cystic fibrosis. Conditions for learning are excellent: the stakes are literally life and death. The Cystic Fibrosis Foundation, NIH, and medical schools conduct research, collect clinical evidence, train specialists, and disseminate best practices. Yet although life expectancy for cystic fibrosis patients has risen significantly over the past decades, large performance differences persist across treatment centers.22 At the same time that many beneficial innovations diffuse slowly and unevenly, widely held superstitions remain immune to evidence (e.g., copper bracelets for arthritis treatment, “feed a cold, starve a fever,” astrology). Many of the heuristics we use to interpret evidence lead to systematically erroneous but strongly self-confirming inferences. Complexity hinders learning from evidence.

Many scientists respond to the complexity and learning problems by arguing that policy should be left to the experts. But this “Manhattan Project” approach (where experts secretly provide advice to inform decisions made without consulting the public or their elected representatives) fails when success requires behavior change throughout society. Effective interventions for problems from HIV/AIDS to global warming require changes in the beliefs and behaviors of a large majority of the population, supported by complementary changes in education, incentives, and institutions. Decisions once taken by experts are now seen to affect multiple stakeholders, the public at large, and future generations. People are often suspicious of experts and their evidence, believing—often with just cause—that those with power and authority routinely manipulate the policy process for ideological, political, or pecuniary purposes.23 Unable to assess the reliability of evidence about complex issues on their own, and frequently excluded from the policy process, citizen noncompliance and active resistance grow, from motorcycle helmet laws to measles–mumps–rubella immunization.24,25 Complexity hinders the implementation of policies on the basis of evidence.

Most people define complexity in terms of the number of components or possible states in a system. In pharmaceutical development, for example, optimally screening new compounds for therapeutic activity is highly complex, but the complexity lies in finding the best solution out of an astronomical number of possibilities. Such needle-in-a-haystack problems have high levels of combinatorial complexity. However, most cases of policy resistance arise from dynamic complexity—the often counterintuitive behavior of complex systems that arises from the interactions of the agents over time.17 The text box on page 507 describes some of the characteristics of complex systems. Where the world is dynamic, evolving, and interconnected, we tend to make decisions using mental models that are static, narrow, and reductionist. Among the elements of dynamic complexity people find most problematic are feedback, time delays, and stocks and flows.

Causes of Policy Resistance

Policy Resistance Arises Because Systems Are

Constantly changing.seconds. Bull markets can rise for years, then crash in a matter of hours.

Tightly coupled. The actors in a system interact strongly with one another and with the natural world. Everything is connected to everything else. “You can’t do just one thing.”

Governed by feedback. Because of the tight couplings among actors, our actions feed back on themselves. Our decisions alter the state of the world, causing changes in nature and triggering others to act, thus giving rise to a new situation, which then influences our next decisions.

Nonlinear. Effect is rarely proportional to cause, and what happens locally in a system (near the current operating point) often does not apply in distant regions (other states of the system). Nonlinearity often arises from basic physics: insufficient inventory may cause you to boost production, but production can never fall below zero no matter how much excess inventory you have. Nonlinearity also arises as multiple factors interact in decisionmaking: Pressure from the boss for greater achievement increases your motivation and effort—up to the point where you perceive the goal to be impossible. Frustration then dominates motivation—and you give up or get a new boss.

History-dependent. Many actions are irreversible: you can’t unscramble an egg (the second law of thermodynamics). Stocks and flows (accumulations) and long time delays often mean doing and undoing have fundamentally different time constants: during the 50 years of the Cold War arms race, the nuclear nations created more than 250 tons of weapons-grade plutonium (239Pu). The half-life of 239Pu is about 24000 years.

Self-organizing. The dynamics of systems arise spontaneously from their internal structure. Often, small, random perturbations are amplified and molded by the feedback structure, generating patterns in space and time. The stripes on a zebra, the rhythmic contraction of your heart, and persistent cycles in measles and the real estate market all emerge spontaneously from the feedbacks among the agents and elements of the system.

Adaptive and evolving. The capabilities and behaviors of the agents in complex systems change over time. Evolution leads to selection and proliferation of some agents while others become extinct. People adapt in response to experience, learning new ways to achieve their goals in the face of obstacles. Learning is not always beneficial, however, but often superstitious and parochial, maximizing local, short-term objectives at the expense of long-term success.

Characterized by trade-offs. Time delays in feedback channels mean the long-run response of a system to an intervention is often different from its short-run response. Low-leverage policies often generate transitory improvement before the problem grows worse, whereas high-leverage policies often cause worse-before-better behavior.

Counterintuitive. In complex systems, cause and effect are distant in time and space, whereas we tend to look for causes near the events we seek to explain. Our attention is drawn to the symptoms of difficulty rather than the underlying cause. High-leverage policies are often not obvious.

Policy resistant. The complexity of the systems in which we are embedded overwhelms our ability to understand them. The result: many seemingly obvious solutions to problems fail or actually worsen the situation.


Like organisms, social systems contain intricate networks of feedback processes, both self-reinforcing (positive) and self-correcting (negative) loops. However, studies show that people recognize few feedbacks; rather, people usually think in short, causal chains, tend to assume each effect has a single cause, and often cease their search for explanations when the first sufficient cause is found.26,27 Failure to focus on feedback in policy design has critical consequences. Suppose the hospital you run faces a deficit, caught between rising costs and increasing numbers of uninsured patients. In response, you might initiate quality improvement programs to boost productivity, announce a round of layoffs, and accelerate plans to offer new high-margin elective surgical services. Your advisors and spreadsheets suggest that these decisions will cut costs and boost income. Problem solved—or so it seems.

Contrary to the open-loop model behind these decisions, the world reacts to our interventions (Figure 1). There is feedback: our actions alter the environment and, therefore, the decisions we take tomorrow. Our actions may trigger so-called side effects that we did not anticipate. Other agents, seeking to achieve their goals, act to restore the balance that we have upset; their actions also generate intended and unintended consequences. Goals are also endogenous, evolving in response to changing circumstances. For example, we strive to earn more in a quest for greater happiness, but habituation and social comparison rapidly erode any increase in subjective well-being.28

Policy resistance arises because we do not understand the full range of feedbacks surrounding—and created by—our decisions. The improvement initiatives you mandated never get off the ground because layoffs destroyed morale and increased the workload for the remaining employees. New services were rushed to market before all the kinks were worked out; unfavorable word of mouth causes the number of lucrative elective procedures to fall as patients flock to competitors. More chronically ill patients show up in your ER with complications after staff cuts slashed resources for patient education and follow-up; the additional workload forces still greater cuts in prevention. Stressed by long hours and continual crisis, your most experienced nurses and doctors leave for jobs with competitors, further raising the workload and undercutting quality of care. Hospital-acquired infections and preventable errors increase. Malpractice claims multiply. Yesterday’s solutions become today’s problems.

Ignoring the feedbacks in which we are embedded leads to policy resistance as we persistently react to the symptoms of difficulty, intervening at low leverage points and triggering delayed and distant effects. The problem intensifies, and we react by pulling those same policy levers still harder in an unrecognized vicious cycle. Policy resistance breeds cynicism about our ability to change the world for the better. Systems thinking requires us to see how our actions feed back to shape our environment. The greater challenge is to do so in a way that empowers, rather than reinforces, the belief that we are helpless victims of forces that we neither influence nor comprehend.

Time delays

Time delays in feedback processes are common and particularly troublesome. Most obviously, delays slow the accumulation of evidence. More problematic, the short- and long-run impacts of our policies are often different (smoking gives immediate pleasure, while lung cancer develops over decades). Delays also create instability and fluctuations that confound our ability to learn. Driving a car, drinking alcohol, and building a new semiconductor plant all involve time delays between the initiation of a control action (accelerating/braking, deciding to “have another,” the decision to build) and its effects on the state of the system. As a result, decision makers often continue to intervene to correct apparent discrepancies between the desired and actual state of the system even after sufficient corrective actions have been taken to restore equilibrium. The result is overshoot and oscillation: stop-and-go traffic, drunkenness, and high-tech boom and bust cycles.29 Public health systems are not immune to these dynamics, from oscillations in incidence of infectious diseases, such as measles30 and syphilis,31 to the 2004–2005 flu vaccine fiasco, with scarcity and rationing followed within months by surplus stocks.32

Stocks and Flows

Stocks and the flows that alter them (the concepts of prevalence and incidence in epidemiology) are fundamental in disciplines from accounting to zoology: a population is increased by births and decreased by mortality; the burden of mercury in a child’s body is increased by ingestion and decreased by excretion. The movement and transformation of material among states is central to the dynamics of complex systems. In physical and biological systems, resources are usually tangible: the stock of glucose in the blood; the number of active smokers in a population. The performance of public health systems, however, is also determined by resources such as physician skills, patient knowledge, community norms, and other forms of human, social, and political capital.

Research shows people’s intuitive understanding of stocks and flows is poor in two ways. First, narrow mental model boundaries mean that people are often unaware of the networks of stocks and flows that supply resources and absorb wastes. California’s Air Resources Board seeks to reduce air pollution by promoting so-called zero emission vehicles.33 True, zero emission vehicles need no tailpipe. But the plants required to make the electricity or hydrogen to run them do generate pollution. California is actually promoting displaced emission vehicles, whose wastes would blow downwind to other states or accumulate in nuclear waste dumps outside its borders. Air pollution causes substantial mortality, and fuel cells may prove to be an environmental boon compared with internal combustion. But no technology is free of environmental impact, and no legislature can repeal the second law of thermodynamics.

Second, people have poor intuitive understanding of the process of accumulation. Most people assume that system inputs and outputs are correlated (e.g., the higher the federal budget deficit, the greater the national debt will be).34 However, stocks integrate (accumulate) their net inflows. A stock rises even as its net inflow falls, as long as the net inflow is positive: the national debt rises even as the deficit falls—debt falls only when the government runs a surplus; the number of people living with HIV continues to rise even as incidence falls—prevalence falls only when infection falls below mortality. Poor understanding of accumulation has significant consequences for public health and economic welfare. Surveys show most Americans believe climate change poses serious risks, but they also believe that reductions in GHG emissions sufficient to stabilize atmospheric GHG concentrations can be deferred until there is greater evidence that climate change is harmful.35 Federal policy makers likewise argue that it is prudent to wait and see whether climate change will cause substantial economic harm before undertaking policies to reduce emissions.36 Such wait-and-see policies erroneously presume that climate change can be reversed quickly should harm become evident, underestimating immense delays in the climate’s response to GHG emissions. Emissions are now about twice the rate at which natural processes remove GHGs from the atmosphere.37 GHG concentrations will therefore continue to rise even if emissions fall, stabilizing only when emissions equal removal. In contrast, experiments with highly educated adults—graduate students at MIT—show that most believe atmospheric GHG concentrations can be stabilized while emissions into the atmosphere continuously exceed the removal of GHGs from it.35 Such beliefs are analogous to arguing that a bathtub filled faster than it drains will never overflow. They violate conservation of matter, and the violation matters: wait-and-see policies guarantee that atmospheric GHG concentrations, already greater than any in the past 420000 years,37 will rise far higher, increasing the risk of dangerous changes in climate that may significantly harm public health and human welfare.

Just as dynamics arise from feedback, so too all learning depends on feedback. As we perceive discrepancies between desired and actual states, we take actions that (we believe) will cause the real world to move toward the desired state. New information about the state of the world causes us to revise our perceptions and the decisions we make in the future. When driving, I may turn the steering wheel too little to bring the car back to the center of my lane, but as visual feedback reveals the error, I continue to turn until the car returns to the straight and narrow. Such single-loop learning is shown in the top of Figure 2.

Information feedback about the real world is not the only input to our decisions. Decisions are the result of applying a decision rule or policy to information about the world as we perceive it.8 These policies are conditioned by institutional structures, organizational strategies, and cultural norms, which, in turn, are shaped by our mental models (see the bottom of Figure 2). Single-loop learning is the process whereby we learn to reach our current goals in the context of our existing mental models. Single-loop learning does not result in deep change in our mental models—the time horizon we consider relevant—nor in our goals and values.

Deep change in mental models, or double-loop learning,38 arises when evidence not only alters our decisions within the context of existing frames, but also feeds back to alter our mental models. As our mental models change, we change the structure of our systems, creating different decision rules and new strategies. The same information, interpreted by a different model, now yields a different decision. Systems thinking is an iterative learning process in which we replace a reductionist, narrow, short-run, static view of the world with a holistic, broad, long-term, dynamic view, reinventing our policies and institutions accordingly.

For learning to occur, each link in the single- and double-loop learning processes must work effectively, and we must be able to cycle around the loops faster than changes in the real world render existing knowledge obsolete. Yet these feedbacks often do not operate well. Each link in the learning loops can fail (Figure 2).

Limited information and ambiguity

We experience the real world through filters. No one knows the current incidence or prevalence of any disease. Instead surveillance systems report estimates of these data on the basis of sampled, averaged, and delayed measurements. The act of measurement introduces distortions, delays, biases, errors, and other imperfections, some known, others unknown and unknowable.

Above all, measurement is an act of selection. Our senses and information systems select but a tiny fraction of possible experience. We define GDP so that medical care caused by pollution-induced disease adds to the GDP, whereas the production of the pollution itself does not reduce it. Because the prices of most goods do not include the costs and consequences of environmental degradation and resource depletion, these externalities receive little weight in policymaking.39,40

Of course, the information systems governing the feedback we receive can change as we learn. Figure 2 also shows feedback between mental models and the information feedback available to us: seeing is believing and believing is seeing. Through our mental models, we define constructs, such as GDP, and design systems to evaluate and report them. We conflate what is salient, tangible, and familiar with what is important. As we measure these things, they become even more real, whereas the remote effects of our decisions, the unfamiliar, and the intangible fade like wraiths. Thus we confuse the military budget with security, GDP per capita with happiness, and the size of our houses with the quality of our home life.

The self-reinforcing feedback between expectations and perceptions has been repeatedly demonstrated.27 Sometimes the positive feedback assists learning by sharpening our ability to perceive features of the environment, as when an experienced naturalist identifies a bird in a distant bush where the novice sees only a tangled thicket. Often, however, the mutual feedback of expectations and perception blinds us to the anomalies that might challenge our mental models and lead to deep insight.41

As one of many examples, consider the history of ozone depletion by chlorofluorocarbons (CFCs). The first evidence describing the ability of CFCs to destroy atmospheric ozone was published in 1974.42,43 Industries dependent on CFCs argued that uncertainty in the evidence warranted inaction. Despite a ban on CFCs as aerosol propellants, global production of CFCs remained near its all-time high. It was not until 1985 that evidence of the Antarctic ozone hole was published.44 As described by Meadows, Meadows, and Randers:

The news reverberated around the scientific world. Scientists at [NASA]...scrambled to check readings on atmospheric ozone made by the Nimbus 7 satellite, measurements that had been taken routinely since 1978. Nimbus 7 had never indicated an ozone hole.

Checking back, NASA scientists found that their computers had been programmed to reject very low ozone readings on the assumption that such low readings must indicate instrument error.45(p151–152)

Scientists’ preconceptions about “normal” ozone concentrations led them to design a measurement system that made it impossible to detect evidence that might have shown that belief to be wrong. Fortunately, NASA had saved the original, unfiltered data, and later confirmed that ozone concentrations had indeed been falling since the launch of Nimbus 7. By creating a measurement system immune to disconfirmation, the discovery of the ozone hole and resulting global agreements to cease CFC production were delayed by as much as 7 years.

Bounded rationality and the misperceptions of feedback

Humans are not computers, coolly assessing possibilities and probabilities. Emotions, reflex, unconscious motivations, and other nonrational or irrational factors all play a large role in our judgments and behavior. But even when we find the time to deliberate, we cannot behave in a fully rational manner (that is, make the best decisions possible given the available information). As marvelous as the human mind is, the complexity of the real world dwarfs our cognitive capabilities. Herbert Simon articulated these limits in his famous principle of “bounded rationality,” for which he won the Nobel Memorial Prize in economics in 1978:

The capacity of the human mind for formulating and solving complex problems is very small compared with the size of the problem whose solution is required for objectively rational behavior in the real world or even for a reasonable approximation to such objective rationality.46(p198)

Faced with the overwhelming complexity of the real world, time pressure, and limited cognitive capabilities, we are forced to fall back on rote procedures, habit, rules of thumb, and simple mental models. Although we sometimes strive to make the best decisions we can, bounded rationality means that we often systematically fall short.

Bounded rationality is particularly acute in dynamic systems. Experiments show that people do quite poorly in systems with even modest levels of dynamic complexity; for example, creating business cycles,29 bankrupting their companies,47 depleting renewable resources,48 and delaying medical treatment while (simulated) patients sicken and die.49 These misperceptions of feedback are robust to experience and financial incentives.9

Among the most damaging misperceptions is the tendency to attribute the behavior of others to dispositional rather than situational factors; that is, to character and especially character flaws rather than the system in which they are embedded—the “fundamental attribution error.” The atrocities at Abu Ghraib were blamed on a few bad apples, whereas decades of research, from Milgram’s obedience studies and the Stanford prison experiment on, demonstrate that “it’s not the apples, it’s the barrel.”50 Despite overwhelming evidence that our behavior is molded by pressures created by the systems in which we act, problems, such as the failure of patients to stay on their medications, recidivism among drug users, and childhood obesity, are persistently attributed to the undisciplined personal habits, poor attitude, or low intelligence of these “others.”51 The focus becomes scapegoating and blame, and policy centers on controls to force compliance. Blame and attempts to control behavior provoke resistance and patient drop-out, strengthening the erroneous belief that these people are unreliable incompetents requiring still greater monitoring and control.52 Recognizing the power of system structure to shape behavior does not relieve us of personal responsibility for our actions. To the contrary, it enables us to focus our efforts where they have highest leverage—the design of systems in which ordinary people can achieve extraordinary results.53

Poor inquiry skills

Learning effectively in a world of dynamic complexity requires dedicated application of scientific method. Unfortunately, people are poor intuitive scientists. We do not generate alternative explanations or control for confounding variables. Our judgments are strongly affected by the frame in which the information is presented, even when the objective information is unchanged. We suffer from overconfidence in our judgments (underestimating uncertainty), wishful thinking (assessing desired outcomes as more likely than undesired outcomes), and confirmation bias (seeking evidence consistent with our preconceptions). Scientists and professionals, not only “ordinary” people, suffer from many of these judgmental biases.27,54

Some argue that, while people err in applying the principles of logic, at least they appreciate the desirability of scientific explanation. Unfortunately, the situation is far worse. The scientific worldview is a recent development in human history, and remains rare. Many place their faith in what Dostoyevsky’s Grand Inquisitor called “miracle, mystery, and authority”55; for example, astrology, creationism, Elvis sightings, and cult leaders promising Armageddon. The persistence of such superstitions is strongly self-reinforcing: during his career with the Boston Red Sox, Hall of Fame hitter Wade Boggs ate chicken every game-day for years because he once played particularly well after a dinner of lemon chicken.56 While on the chicken diet, which he came to loathe, Boggs won five batting championships, decisively proving the “chicken theory.”

Such foolishness aside, there are more disturbing reasons for the prevalence of these learning failures. Human beings are more than cognitive information processors. We have a deep need for emotional and spiritual sustenance. But from Copernican heliocentrism through relativity, quantum mechanics and evolution, science has stripped away ancient and comforting beliefs, placing humanity at the center of a world designed for us by a supreme authority. For many people, science leads not to enlightenment and empowerment, but to existential angst and the absurdity of human insignificance in an incomprehensibly vast universe. Others believe science and technology are the shock troops for the triumph of materialism and instrumentalism over the sacred and spiritual. These antiscientific reactions are powerful forces. In many ways, they are important truths. They have led to many of the most profound works of art and literature. But they can also lead to mindless new-age psycho-babble and radical fundamentalism.

Readers should not conclude that I am a naive defender of science as it is practiced nor an apologist for the real and continuing damage done to the environment and to our cultural, moral, and spiritual lives in the name of rationality and progress. On the contrary, I have stressed the research showing that scientists are often as prone to error and bias as lay people. It is precisely because scientists are subject to the same cognitive limitations and moral failures as others that we experience abominations such as the Tuskegee experiment.57 Systems thinking requires us to examine issues from multiple perspectives, to expand the boundaries of our mental models, to consider the long-term consequences of our actions, including their environmental, cultural, and moral implications.58,59

Defensive routines and implementation failure

Learning by groups can be thwarted even if participants receive excellent information and reason well as individuals. Argyris and Schön,38 Janis,60 Schein,61 and others document the defensive routines people rely on, often unknowingly, in interpersonal interactions. We use defensive routines to save face, make untested inferences seem like facts, and advocate our positions while appearing to be neutral. We make strong attributions not grounded in data. We avoid publicly testing our beliefs, tacitly communicating that we are not open to having our mental models challenged. Defensive routines often yield group-think, as members of a group mutually reinforce their current beliefs, suppress dissent, and seal themselves off from those with different views or possible disconfirming evidence.

Even if a team were united in recommending the proper course of action, the implementation of their decisions is often distorted by asymmetric information, private agendas, and game playing by agents throughout a system. Obviously, implementation failures can hurt an organization. Imperfect implementation can hinder learning as well, because the managers evaluating the outcomes of their decisions may not know the ways in which those decisions were distorted, delayed, or derailed altogether by other actors in the system.

Finally, because error is often costly and many decisions are irreversible, the need to maintain performance often overrides the experimentation needed to learn. It’s important for pilots to learn how steep a dive their aircraft can handle in case an emergency requires rapid descent. But no pilot would try a maximum dive on the 10 o’clock to Chicago just to learn. Even when the consequences of experiments are mild, however, the fear of failure, of appearing to have made a mistake, often stifles innovation. Voltaire advised that we “love truth and pardon error,”62 but the desire to avoid embarrassment regularly suppresses deviations from standard practice that might reveal opportunities for improvement.

To learn effectively in a world of dynamic complexity, we must attend to all these impediments (Figure 3). The figure features a new feedback loop created by the use of virtual worlds. Virtual worlds are models or simulations in which decision makers can conduct experiments, rehearse decisionmaking, and play.63 They can be physical models, role-plays, or computer simulations. In systems with significant dynamic complexity, computer simulation will typically be needed.

Simulations provide low-cost laboratories for learning. The virtual world allows time and space to be compressed or dilated. Actions can be repeated under the same or different conditions. One can stop the action to reflect. One can make decisions that are infeasible or unethical in the real system. Participants can receive perfect and immediate outcome feedback. In an afternoon one can gain years of simulated experience. In contrast to the real world, which, like a black box, has a poorly resolved structure, virtual worlds can be open boxes whose assumptions are known and can be modified by the learner. Often, pushing a system into extreme conditions reveals more about its structure and dynamics than incremental adjustments to current practices. Thus a great deal of the time that pilots spend in flight simulators is devoted to extreme conditions, such as engine failure. In the virtual world, you can find the maximum dive angle—“crashing” hurts no one, and you walk away every time, better prepared for a real emergency.

Dubbed the “third branch” of science (after theory and experiment), simulation is now an essential tool in research on problems from galaxy formation to protein folding to epidemiology. Virtual worlds for learning and training are commonplace in the military, pilot training, power plant operations, and other tasks. The use of virtual worlds and simulation models in public policy and management is more recent and less widely adopted. Yet these are precisely the settings in which dynamic complexity is most problematic, the learning feedbacks are least effective, and the stakes are highest.

To be effective, a virtual world must capture those aspects of the real system of concern to the decision makers with sufficient fidelity for their purpose. In addition, the user interface must enable people to learn from the model. The most insightful model accomplishes nothing if the interface is obscure and the protocol for its use ineffective. The converse is worse: a poor model embedded in a potent interface may teach harmful lessons more effectively than ever before. Effective virtual worlds require both substantive fidelity and a productive learning process that enables people to challenge and improve their mental models.

How can the substantive quality of a model be assessed? System dynamics emphasizes a multifaceted process for testing models, identifying errors, and comparing model assumptions and behavior to data. The process of model testing and improvement is iterative. Discrepancies between mental models, formal models, and data stimulate improvements in each.64,65 For the testing process to be effective, models must be fully documented so that independent third parties can replicate the results, carry out sensitivity analysis, try alternative theories, and subject the model to extreme conditions. Space does not permit a full treatment of these tests; readers should consult the extensive literature for principles and examples.9

Although simulation models may be necessary for effective learning in dynamically complex systems, they are not sufficient to overcome the flaws in our scientific reasoning skills and group processes. Although a virtual world enables controlled experimentation, most policy makers lack training in scientific method and the design of experiments. A commonly observed behavior in virtual worlds is the “video game syndrome” in which people play too much and think too little. People often do not take time to reflect on the outcome of a simulation, identify discrepancies between outcomes and expectations, formulate hypotheses to explain the discrepancies, and then devise experiments to discriminate among competing theories. Defensive routines and groupthink can operate in the learning laboratory just as in real organizations. Indeed, protocols for effective learning in virtual worlds such as public testing of hypotheses can be highly threatening, inducing defensive reactions that prevent learning.66 Managers unaccustomed to disciplined scientific reasoning in an open, trusting environment will have to build these skills before a virtual world can prove useful.

Policies to promote public health and welfare often fail or worsen the problems they are intended to solve. Evidence-based learning should prevent such policy resistance, but learning in complex systems is often weak and slow. Complexity hinders our ability to discover the delayed and distal impacts of interventions, generating unintended “side effects.” Yet learning often fails even when strong evidence is available: common mental models lead to erroneous but self-confirming inferences, allowing harmful beliefs and behaviors to persist and undermining implementation of beneficial policies. When evidence cannot be generated through experiments in the real world, virtual worlds and simulation become the only reliable way to test hypotheses and evaluate the likely effects of policies. Most important, when experimentation in real systems is infeasible, simulation is often the only way we can discover for ourselves how complex systems work. Without the rigorous testing enabled by simulation, it becomes all too easy for policy to be driven by ideology, superstition, or unconscious bias. The alternative is rote learning on the basis of the authority of an expert, a method that dulls creativity and stunts the development of the skills needed to catalyze effective change in complex systems.

When humans evolved, the challenge was survival in a world we could barely influence. Today, the hurricane and earthquake do not pose the greatest danger. It is the unanticipated effects of our own actions, effects created by our inability to understand the complex systems we have created and in which we are embedded. Creating a healthy, sustainable future requires a fundamental shift in the way we generate, learn from, and act on evidence about the delayed and distal effects of our technologies, policies, and institutions. The reductionist program of ever-finer specialization is no longer sufficient. Though often leading to deep and useful knowledge, it contributes to policy resistance by narrowing the boundaries of our mental models. As leaders in public health, you do not face medical problems, financial problems, technical problems, and community-relations problems. You just have problems. Some boundaries are necessary and inevitable: all models must simplify the overwhelming complexity of the world. But all too often ignoring what lies outside familiar walls cuts critical feedbacks and breeds arrogance about our ability to control nature and other people—and we solve one problem only to create others.

What prevents us from overcoming policy resistance is not a lack of resources, technical knowledge, or a genuine commitment to change. What thwarts us is our lack of a meaningful systems thinking capability. That capability requires tools to understand complexity, stocks and flows, feedback, and time delays. It requires the use of virtual worlds and simulations to augment the evidence generated by experiments in the real world. It requires an unswerving commitment to the rigorous application of scientific method, and the inquiry skills we need to expose our hidden assumptions and biases. It requires crossing boundaries between departments and functions in an organization, between disciplines in the academy, between the private and public sector. It requires breaching barriers of culture and class, race and religion. It requires listening with respect and empathy to others—then using these systems thinking capabilities to act in consonance with our long-term goals and deepest aspirations.

Financial support was provided by the Project on Innovation in Markets and Organizations at the MIT Sloan School of Management.

This paper builds on and extends the argument in Chapter 1 of Sterman (2000).9

I thank Jack Homer, Scott Leischow, Bobby Milstein, Mary Northridge, Kim Thompson, and the referees for helpful suggestions.

Human Participant Protection No human participants were involved in this study.


1. Centers for Medicare and Medicaid Services. Historical Statistics file NHEGDP03.zip. Available at: http://www.cms.hhs.gov/statistics/nhe/default.asp?#download. Accessed February 20, 2005. Google Scholar
2. World Health Organization. World Health Report 2004, Statistical Annex. Available at: http://www.who.int/whr/2004/annex/en. Accessed February 20, 2005. Google Scholar
3. Health, United States, 2004 with Chartbook on Trends in the Health of Americans. Hyattsville, Maryland: National Center for Health Statistics; 2004. Google Scholar
4. Hedley AA, Ogden CL, Johnson CL, Carroll MD, Curtin LR, Flegal KM. Overweight and obesity among US children, adolescents, and adults, 1999–2002. JAMA. 291:2847–2850. Crossref, MedlineGoogle Scholar
5. Zack MM, Moriarty DG, Stroup DF, Ford ES, Mokdad AH. Worsening trends in adult health-related quality of life and self-rated health: United States, 1993–2001. Public Health Rep. 2004;119:493–505. Crossref, MedlineGoogle Scholar
6. Institute of Medicine (Committee on Quality of Health Care in America). Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academies Press; 2001. Google Scholar
7. Fong IW, Drlica K. Reemergence of Established Pathogens in the 21st Century. New York, NY: Kluwer/ Plenum Press; 2003. Google Scholar
8. Forrester JW. Industrial Dynamics. Cambridge, MA: MIT Press; 1961. Google Scholar
9. Sterman JD. Business Dynamics: Systems Thinking and Modeling for a Complex World. Boston, MA: Irwin/ McGraw-Hill; 2000. Google Scholar
10. Homer JB, Hirsch GB. System dynamics modeling for public health: background and opportunities. Am J Public Health. 2006;96:452–458. LinkGoogle Scholar
11. Jones AP, Homer JB, Murphy DL, Essien JDK, Milstein B, and Seville DA. Understanding diabetes population dynamics through simulation modeling and experimentation. Am J Public Health. 2006:96:488–494. LinkGoogle Scholar
12. Ritchie-Dunham JL, Mendéz Galvan JF. Evaluating epidemic intervention policies with systems thinking: a case study of dengue fever in Mexico. Syst Dyn Rev. 1999;15:119–138. CrossrefGoogle Scholar
13. Homer J, Ritchie-Dunham J, Rabbino H, Puente LM, Jorgensen J, Hendricks K. Toward a dynamic theory of antibiotic resistance. Syst Dyn Rev. 2000;16:287–319. CrossrefGoogle Scholar
14. Kaplan EH, Craft DL, Wein LM. Emergency response to a smallpox attack: The case for mass vaccination. Proc Natl Acad Sci USA. 2002;99: 10935–10940. Crossref, MedlineGoogle Scholar
15. Tengs TO, Ahma S, Savage JM, Moore R, Gage E. The AMA proposal to mandate nicotine reduction in cigarettes: a simulation of the population health impacts. Prev Med. 2005;40:170–180. Crossref, MedlineGoogle Scholar
16. Duintjer Tebbens RJ, Pallansch MA, Kew OM, Cáceres VM, Sutter RW, Thompson KM. A dynamic model of poliomyelitis outbreaks: learning from the past to help inform the future. Am J Epidemiol. 2005; 358–372. Crossref, MedlineGoogle Scholar
17. Forrester JW. Counterintuitive behavior of social systems. Technol Rev. 1971;73:53–68. Google Scholar
18. Mosteller F. Innovation and evaluation, Science. 1981;211:881–886. Crossref, MedlineGoogle Scholar
19. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Ann Intern Med. 2003; 138:273–287. Crossref, MedlineGoogle Scholar
20. Clancy CM, Cronin K. Evidence-Based Decision Making: Global Evidence, Local Decisions. Health Aff. 2005;24:151–162. Crossref, MedlineGoogle Scholar
21. McGahan A. The Performance of US. Corporations: 1981–1994. J Ind Econ. 1999;47:373–398. CrossrefGoogle Scholar
22. Gawande A. The bell curve. The New Yorker, December 6, 2004. Available at: http://www.newyorker.com/fact/content/?041206fa_fact. Accessed February 20, 2005. Google Scholar
23. Union of Concerned Scientists. Restoring Scientific Integrity. 2005. Available at: http://www.ucsusa.org/global_environment/rsi/index.cfm. Accessed February 20, 2005. Google Scholar
24. United States Freedom Foundation. Helmet Laws of the 50 States and How to Beat Them! Available at: http://www.usff.com/hldl/hlstatutes/50statehls.html. Accessed February 20, 2005. Google Scholar
25. Jansen VAA, Stollenwerk N, Jensen HJ, Ramsay ME, Edmunds WJ, Rhodes CJ. Measles outbreaks in a population with declining vaccine uptake. Science 2003; 301:804. Crossref, MedlineGoogle Scholar
26. Dörner D. The Logic of Failure. New York, NY: Metropolitan Books/Henry Holt; 1996. Google Scholar
27. Plous S. The Psychology of Judgment and Decision Making. New York, NY: McGraw Hill; 1993. Google Scholar
28. Kahneman D, Diener E, Schwarz N. Well-Being: The Foundations of Hedonic Psychology. New York, NY: Russell Sage; 1999. Google Scholar
29. Sterman JD. Modeling managerial behavior: mis-perceptions of feedback in a dynamic decision making experiment. Manage Sci. 1989;35:321–339. CrossrefGoogle Scholar
30. Anderson RM, Grenfell BT, May RM. Oscillatory fluctuations in the incidence of infectious disease and the impact of vaccination: time series analysis. J Hyg (Lond). 1984;93:587–608. Crossref, MedlineGoogle Scholar
31. Grassly NC, Fraser C, Garnett GP. Host immunity and synchronized epidemics of syphilis across the United States. Nature. 2005;433:417–421. Crossref, MedlineGoogle Scholar
32. US House Committee on Government Reform, 109th Congress. The Perplexing Shift from Shortage to Surplus: Managing This Season’s Flu Shot Supply and Preparing for the Future. 10 Feb 2005. Available at: http://reform.house.gov/GovReform/Hearings/EventSingle.aspx?EventID=21749. Accessed December 10, 2005. Google Scholar
33. California Air Resources Board. Zero Emission Vehicle Program. Available at: http://www.arb.ca.gov/msprog/zevprog/zevprog.htm. Accessed February 28, 2005. Google Scholar
34. Booth Sweeney L, Sterman JD. Bathtub dynamics: initial results of a systems thinking inventory. Syst Dyn Rev 2000;16:249–294. CrossrefGoogle Scholar
35. Sterman JD, Booth Sweeney L. Understanding Public Complacency about Climate Change: Adults’ Mental Models of Climate Change Violate Conservation of Matter. Climatic Change. Forthcoming 2006. Available at: http://web.mit.edu/jsterman/www/Understanding_public.html. Accessed on January 25, 2006. Google Scholar
36. Bush GW. President Announces Clear Skies and Global Climate Change Initiatives. 2002. Available at: http://www.whitehouse.gov/news/releases/2002/02/20020214-5.html. Accessed February 20, 2005. Google Scholar
37. Houghton J, Ding Y, Griggs D, et al. Climate Change 2001: The Scientific Basis. Cambridge, UK: Cambridge University Press; 2001. Google Scholar
38. Argyris C, Schön D. Organizational Learning: A Theory of Action Approach. Reading, MA: Addison-Wesley; 1978. Google Scholar
39. Cobb J, Daly H. For the Common Good. Boston, MA: Beacon Press; 1989. Google Scholar
40. Balmford A, Bruner A, Cooper P, et al. Economic reasons for conserving wild nature. Science. 2002;297: 950–953. Crossref, MedlineGoogle Scholar
41. Kuhn TS. The Structure of Scientific Revolutions. 2nd ed. Chicago: University of Chicago Press; 1970. Google Scholar
42. Molina M, Rowland F. Stratospheric sink for chlorofluoromethanes: chlorine atom-catalysed destruction of ozone. Nature. 1974;249:810–812. CrossrefGoogle Scholar
43. Stolarski R, Cicerone R. Stratospheric chlorine: a possible sink for ozone, Can J Chem. 1974;52:1610. CrossrefGoogle Scholar
44. Farman J, Gardiner B, Shanklin J. Large losses of total ozone in Antarctica reveal seasonal ClO/NO2 interaction. Nature 1985;315:207–210. CrossrefGoogle Scholar
45. Meadows DH, Meadows DL, Randers J. Beyond the Limits. Post Mills, VT: Chelsea Green Publishing Company; 1992. Google Scholar
46. Simon HA. Administrative Behavior: A Study of Decision-Making Processes in Administrative Organizations. 2nd ed. New York, NY: Macmillan; 1957. Google Scholar
47. Paich M, Sterman JD. Boom, Bust, and Failures to Learn in Experimental Markets. Manage Sci. 1993;39: 1439–1458. CrossrefGoogle Scholar
48. Moxnes E. Not only the tragedy of the commons: misperceptions of feedback and policies for sustainable development. Syst Dyn Rev. 2000;16:325–348. CrossrefGoogle Scholar
49. Kleinmuntz D, Thomas J. The value of action and inference in dynamic decision making, Organ Behav Hum Decis Process. 1987;39:341–364. CrossrefGoogle Scholar
50. Fiske ST, Harris LT, Cuddy AJC. Why ordinary people torture enemy prisoners. Science. 2004;306: 1482–1483. Crossref, MedlineGoogle Scholar
51. Redwood H. Do you intend to be a ‘responsible patient’? Health and Age. 2004. Available at: http://www.healthandage.com/PHome/gm=20!gid2=1695. Accessed February 28, 2005. Google Scholar
52. Repenning NP, Sterman JD. Capability traps and self-confirming attribution errors in the dynamics of process improvement. Adm Sci Q. 2002;47:265–295. CrossrefGoogle Scholar
53. Repenning NP, Sterman JD. Nobody ever gets credit for fixing problems that never happened: creating and sustaining process improvement. Calif Manage Rev. 2001;43:64–88. CrossrefGoogle Scholar
54. Kahneman D, Slovic P, Tversky A. Judgment Under Uncertainty: Heuristics and Biases. Cambridge, UK: Cambridge University Press; 1982. Google Scholar
55. Dostoyevsky F. The Brothers Karamazov. Garnett C, trans. New York: Vintage Books; 1950. Google Scholar
56. Shaughnessy D. Bogged Down. Boston Globe. March 15, 1987;20. Google Scholar
57. National Center for HIV, STD, and TB Prevention. Tuskegee Timeline. 2005. Available at: http://www.cdc.gov/nchstp/od/tuskegee/time.htm. Accessed May 20, 2005. Google Scholar
58. Sterman JD. All models are wrong: reflections on becoming a systems scientist. Syst Dyn Rev. 2002;18: 501–531. CrossrefGoogle Scholar
59. Meadows DH, J. Richardson J, Bruckmann G. Groping in the Dark. Chichester, England: John Wiley & Sons; 1982. Google Scholar
60. Janis I. Groupthink: Psychological Studies of Policy Decisions and Fiascoes. 2nd ed. Boston, MA: Houghton Mifflin; 1982. Google Scholar
61. Schein E. Process Consultation. Vol 1 (revised ed.). Reading, MA: Addison-Wesley; 1988. Google Scholar
62. Voltaire. Sept Discours en Vers sur l’Homme. 1738. Google Scholar
63. Schön D. The Reflective Practitioner. New York, NY: Basic Books; 1983. Google Scholar
64. Homer JB. Why we iterate: scientific modeling in theory and practice. Syst Dyn Rev. 1996;12:1–19. CrossrefGoogle Scholar
65. Homer JB. Structure, data and compelling conclusions: notes from the field. Syst Dyn Rev. 1997;13: 293–309. CrossrefGoogle Scholar
66. Isaacs W, Senge PM. Overcoming limits to learning in computer-based learning environments. Eur J Oper Res. 1992;59:183–196. CrossrefGoogle Scholar
67. Horn SD, Sharkey PD, Tracy DM, Horn CE, James B, Goodwin F. Intended and unintended consequences of HMO cost-containment strategies: results from the Managed Care Outcomes project. Am J Manag Care. 1996;2:253–264. Google Scholar
68. Wilde GE. Target Risk 2: A New Psychology of Safety and Health. 2nd ed. Toronto, Canada: PDE Publications; 2001. Google Scholar
69. Substance Abuse and Mental Health Services Administration. National Survey on Drug Use and Health (NSDUH). Available at: http://www.drugabusestatistics.samhsa.gov. February 20, 2005. Google Scholar
70. Influence of Forest Structure on Wildfire Behavior and the Severity of Its Effects. Washington, DC: USDA Forest Service; 2003. Available at: http://www.fs.fed.us/projects/hfi/science.shtml. Accessed February 20, 2005. Google Scholar
71. Palumbi SR. Humans as the world’s greatest evolutionary force. Science. 2001;293:1786–1790. Crossref, MedlineGoogle Scholar
72. Lightfoot M, Swendeman D, Rotheram-Borus MJ, Comulada WS, Weiss R. Risk behaviors of youth living with HIV: pre- and post-HAART. Am J Health Behav. 2005;29:162–172. Crossref, MedlineGoogle Scholar


No related items




John D. Sterman, PhDThe author is with the MIT Sloan School of Management, Cambridge, Mass. “Learning from Evidence in a Complex World”, American Journal of Public Health 96, no. 3 (March 1, 2006): pp. 505-514.


PMID: 16449579