Physicians depend on clinical tests for diagnosis, but these are often inconclusive at early stages of disease, and sometimes cannot distinguish between disease subcategories.

The Lung Diseases project aims to discover the cellular factors that lead to susceptibility and progression of chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis.

We are analyzing data from the Lung Genomics Research Consortium and the Lung Tissue Research Consortium to discover and model causal relationships between molecular variables, clinical variables, and image features to characterize disease mechanisms and predict disease severity. The data include high-resolution tissue images as well as SNP, DNA methylation, mRNA expression, and microRNA expression data from patient biospecimens.Spectral_Blocking

However, before we can use omics data in causal modeling, we must address an important issue: most clinical omics data are acquired from homogenized tissues with multiple cell types. Causal modeling of cellular function is facilitated when measured variables are derived from single cells or, at the very least, from homogeneous cell type populations. Thus, we will first partition the existing omics measurements into relevant tissue types by using the matched images for guidance (see figure for preliminary results).

First, we will identify tissue compartments using our Spectral Blocking approach, which computationally groups image patches into tissue compartments of shared cell type, and the guidance of our pathologist collaborator, Dr. Frank Schneider.

We will then partition the omics signals across patients (tissues) using one of several existing methods for deconvolution or unmixing of genomic signals from heterogeneous samples.

Finally, we will use causal discovery algorithms to identify molecular interactions, signatures, and pathways that are associated with each disease.