Solving for Why (Communications of the ACM)

Thanks to large datasets and machine learning, computers have become surprisingly adept at finding statistical relationships among many variables—and exploiting these patterns to make useful predictions. Whether the task involves recognizing objects in photographs or translating text from one language to another, much of what today’s intelligent machines can accomplish stems from the computers’ ability to make predictions based on statistical associations, or correlations.

By and large, computers are very good at this kind of prediction. Yet for many tasks, that is not enough. “In reality, we often want to not only predict things, but we want to improve things,” says Jonas Peters, a professor of statistics at the University of Copenhagen. “This is what you need causal methods for,” explains Peters, co-author of a book about causal inference, a field of study that in recent years has gained interest in computing and other sciences. As the field has developed, it has built up more and more mathematical rigor, giving scientists across a variety of disciplines a formal language for explicitly expressing their assumptions and better tools for acquiring new knowledge.

Within computing, a focus on causality is promising to help researchers overcome shortcomings that have bedeviled more traditional approaches to artificial intelligence (AI).

Toward Better Decision Making

When Peters talks about improving things, the “things” he is referring to span everything from helping college students grasp new material to slowing down climate change. Which teaching methods are most effective? What policies actually reduce global warming? As Peters points out, these are not classical prediction questions, but causal questions. These and countless similar questions aim to uncover not mere correlations between one variable and another, but how actively changing one variable affects what happens to another variable—a causal understanding of the underlying system.

For example, if you know that A and B are correlated, is it the case that A causes B? Does, instead, B cause A? Or is there no direct causal connection between A and B at all—does the correlation between A and B occur due to some third variable, C, that causes both A and B? Having the correct causal model is essential to knowing what to do to get the outcome you want. “You need a causal understanding if you want to make any kind of decisions,” says Murat Kocaoglu, who teaches and conducts research on causal inference at Purdue University’s School of Electrical and Computer Engineering. “Imagine that you have some policy”—some action that you are considering—”and you want to understand what will happen when you implement that policy. A causal model can help you estimate the outcome of your action.”

If understanding causal relationships is crucial to making good decisions, then whenever we include AI in decision making, we have to teach it about these causal relationships. The importance of doing so is particularly clear when it comes to making decisions fair: How can we make sure that algorithmic decision making, especially when used in high-stakes areas like hiring, lending, and criminal sentencing, doesn’t perpetuate or even amplify the biases of human decision-makers?

Causality researchers believe teaching computers to understand causal relationships is key: a misguided focus on spurious correlations tends to lead to unfair discrimination, they say. “If it’s a correlation but not causal, it is often considered unfair,” Peters says, illustrating his point with the deliberately absurd example of hiring on the basis of hair length. If data shows that, on average, people with longer hair (who tend to be female) perform better in a particular job, and neither hair length nor gender actually improves job performance, using those criteria in hiring would be unfair both to men and to women with short hair; it would also be bad for the employer, whose goal is to hire the best person for the job. Figuring out what actually causes good performance, such as a specific set of skills, would lead to better hiring decisions.

More Robust AI

Besides helping make better decisions, causal inference has the potential to improve the performance of machine learning algorithms even for purely predictive tasks, according to causality researchers. “Most of the bottlenecks and challenges that we have in AI today come from the lack of causality capabilities,” says Elias Bareinboim, an associate professor of computer science at Columbia University and director of the university’s Causal Artificial Intelligence Lab.

One of these big challenges is the fragility of many current AI systems—their tendency to fail when facing even a slightly different set of conditions than those on which they were trained. Most existing machine learning models, which ignore causality, assume training and test data come from the same distribution, but that is often not a safe assumption. For example, if you want to prepare a robot to dig rocks on Mars during a future space mission, the best you might be able to do is to train it to dig rocks on Earth, perhaps in the California desert. The same is true for self-driving cars trained in one country and later deployed elsewhere.

The goal, of course, is for robots and other AIs to flexibly adapt to changing conditions. This ability, after all, is clearly a form of human intelligence. If you learn to drive in an American suburb, chances are you’ll have little trouble driving in a suburb of France. AI researchers are exploring various approaches to making machines similarly good at adapting to changing conditions, and researchers in the field of causal inference have good reason to think that causality can help. “It is often safe to assume that most of the causal mechanisms remain stable,” Peters explains. “Understanding the causal structure of a system may therefore help us to adapt quickly to the new environment.”

However, figuring out the causal structure of a system can be difficult for both machines and humans. “For the longest time, we didn’t even know if smoking caused cancer or not,” points out Kocaoglu. We had only observational data—that is, data showing a correlation between smoking and cancer. Since conducting randomized controlled trials on smoking is out of the question, and scientists in the 1950s had not yet developed methods for causal inference without such controlled experiments, they struggled with ruling out the possibility that something other than smoking might be causing higher rates of lung cancer among smokers.

When young children learn about the world through play, they are doing something akin to randomized controlled trials. “A toddler can take actions and observe outcomes,” Kocaoglu says. Doing that enables toddlers to figure out, for example, that dropping a toy causes it to fall. But in classical machine learning, the data is not randomized: “If you have a single dataset, you have a single outcome for every row—you never took different actions and observed different outcomes,” says Kocaoglu, whose area of research is causal discovery, the process of trying to learn causal relationships from non-randomized data.

A Shared Language

“People across fields have been thinking about causality for a long time,” says Rohit Bhattacharya, a computer scientist at Williams College who teaches a course on causal inference and develops methods to infer causality from “messy” data—datasets with missing data and statistical biases such as selection bias. “What’s nice that computer science has offered to the causal community is the language of graphs,” he says, referring to the intuitively appealing system of nodes and causal arrows developed by UCLA computer scientist Judea Pearl, the 2011 recipient of ACM’s A.M. Turing Award. Pearl’s big contribution, according to Bhattacharya, “is this language that allows computer scientists to talk to statisticians, to talk to doctors.” (Pearl’s isn’t the only lingua franca of causal inference: the Neyman-Rubin Causal Model, sometimes known as the potential outcomes framework, also is widely used, particularly in economics, political science, and legal studies.)

The formal language of graphs offers another advantage, adds Bareinboim, the professor at Columbia University who was a doctoral advisee of Pearl’s: creating causal graphical models requires scientists to make all their assumptions about the underlying causal mechanisms explicit, thus paving the way for any attempt to automate causal inference.

Thorough, explicit representation is essential for causal discovery. If you are a computer scientist, causal discovery involves talking to experts in the subject you’re investigating to reveal all the relevant variables and to create causal graphs that represent all the possible competing hypotheses about the relationships among the variables. “Different patterns of causation between the variables also imply different patterns of association,” Bhattacharya explains, so part of the process of discovery is going into the observational data to see which of these hypotheses are consistent with the data.

The fact that a particular causal model fits the data, however, does not mean that it is possible to use the data to estimate the strength of the effects. The identification step of causal inference answers that question. Here, another major contribution of Pearl’s comes into play: a formal system called the do-calculus (whose name reflects the active nature of causal questions, as in “How does Y change when I do X?”). Among other things, the do-calculus provides a way to determine whether it’s possible to compute the causal effect of a new intervention on the outcome of interest. Contrary to the hope that big data holds all the answers, experts in causal inference know this is not always the case, especially when the data is biased or messy in other ways. “The answer might be, `It’s impossible, so please don’t make us torture the data for things that aren’t in it’,” says Bhattacharya.

If it is possible to do estimation, though, machine learning can help, enabling researchers to create models that fit the complexity of many real-world relationships better than through traditional statistical analysis, particularly linear regression.

Finally, since the causal graph contains assumptions, causal inference requires some sort of sensitivity analysis. “It’s sort of saying, `What’s the worst-case scenario?'” explains Bhattacharya. “`How wrong could my estimates be if what I drew in the graph isn’t quite correct?'”

This article written by Marina Krakovsky appeared in the February 2022 issue of Communications of the ACM.