We are implementing an integrated set of methods that support the graphical representation, discovery, and application of causal knowledge from large and complex biomedical data (see samples of structural causal relationships in figure). We are using two major classes of algorithms: constraint-based algorithms and Bayesian algorithms.

In partnership with our Systems Architecture group, we are also optimizing highly efficient versions of these algorithms (e.g., emphasizing parallelization), so that they are practical to apply to such challenging data.Causal_representation_figure

We are evaluating our algorithms for discovery accuracy and efficiency using both real and simulated data. Our three driving biomedical projects – cancer, lung, brain – provide real data with which to develop and optimize our algorithms.

We will evaluate our algorithms and optimized system architecture for usability and acceptability by the biomedical investigators who use our software interface to apply our algorithms to their data. Feedback from their evaluations will drive improvement of the system and its user interface in a continuous feedback cycle.

Our library will comprise the best causal discovery algorithms reported in the literature, including algorithms we have developed, and new causal discovery algorithms needed to support the analysis of large and complex biomedical datasets.