Tinkering with Causality
As fantastical as it seems, there is mounting evidence from the field of causal inference that time series of a sufficiently high resolution actually can separate causation from correlation. Many of these methods are based on Norbert Wiener’s principle of observational causality. This states that a variable A is causal to a second variable B if the time series of A helps you predict the future of B better than B’s time series can on its own. In practice this amounts to testing for statistical independence between A and B.
PCMCI is a promising new method of this kind which we used to understand the transcriptional causes of estradiol’s effect on breast cancer. It works in two steps, PC and MCI, to find the dependency structure, or causal interactions, of a set of variables observed through time. Here those variables are transcribed loci. PC, which is named after its inventors Peter Spirtes and Clark Glymour, is a skeleton-discovery algorithm. It works by assuming all variables interact and then removes interactions one at a time if they connect variables that are independent conditional on other variables they share common direct and indirect interactions with. This winnows the set of possible causal drivers of each variable to what is called a set of parent variables.
In the second step, MCI, the information-theoretic measure momentary information transfer is used to test for the momentary conditional independence it is named after. MCI tests each of the interactions in the skeleton PC made by asking if the dynamics of the two variables involved, the purported cause and effect, are independent of each other conditional on both of their parent variables. Then, by permuting the order of each time series, PCMCI produces not only an interaction strength but a p-value on the probability of observing as strong an interaction by chance. In this way we used PRO-seq observations of transcription in cells with breast cancer to infer the regulatory pathways involved in their responses to estradiol. The information necessary to find these genetic mechanisms only exists in time series.