Jonas Peters, PhD, Associate Professor of Statistics, Department of Mathematical Sciences, University of Copenhagen (Denmark), “Invariance and Causality” at 11:00 am on Thursday, May 17, 2018, in Rooms 407A/B BAUM, 5607 Baum Blvd., the Offices at Baum.
Abstract: Why are we interested in the causal structure of a process? In classical prediction tasks as regression, for example, it seems that no causal knowledge is required. In many situations, however, we want to understand how a system reacts under interventions, e.g., in gene knock-out experiments. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction uses only direct causes of the target variable as predictors; it remains valid even if we intervene on predictor variables or change the whole experimental setting. In this talk, we show how we can exploit this invariance principle to estimate causal structure from data. We apply the methodology to data sets from biology, epidemiology, and finance.
Biography: Before joining MATH/Copenhagen, Dr. Peters has been leading the causality group at the Max Planck Institute for Intelligent Systems, Tübingen. Before that, he was a Marie Curie fellow at the Seminar for Statistics, ETH Zurich.
Dr. Jonas studied mathematics in Heidelberg and Cambridge and did his PhD with B. Schölkopf, D. Janzing and P. Bühlmann. His thesis received the ETH medal. He has worked with L. Bottou at Microsoft Research Redmond (WA, USA), M. Wainwright at UC Berkeley (CA, USA) and Peter Spirtes at CMU (PA, USA).
His research focuses mainly on causal inference: we try to learn causal structures either from purely observational data or from a combination of observational and interventional data. We therefore develop both theory and methodology. Our work relates to areas like high-dimensional statistics, computational statistics or graphical models. It’s an exciting research area with lots of open questions!
Joris M. Mooij, PhD, Associate Professor, Informatics Institute, University of Amsterdam (the Netherlands), “Validating Causal Discovery Methods” at 11:00 am on Thursday, April 19, 2018, in Rooms 407A/B BAUM, 5607 Baum Blvd., the Offices at Baum.
Abstract: Since the pioneering work by Peirce and Fisher, the gold standard for causal discovery is a randomized experiment. An intriguing alternative approach to causal discovery was proposed in the nineties, based on conditional independence patterns in the data. Over the past decades, dozens of causal discovery methods based on that idea have been proposed. These methods clearly work on simulated data when all their assumptions are satisfied. However, demonstrating their usefulness on real data has been a challenge. In this talk, I will discuss some of our recent attempts at validating causal discovery methods on large-scale interventional data sets from molecular biology. I will discuss a micro-array gene expression data set and a mass cytometry data set that seem perfectly suited for validation of causal discovery methods at first sight. As it turns out, however, both causal discovery on these data and the validation of such methods is more challenging than one might think initially.
We find that even sophisticated modern causal discovery algorithms are outperformed by simple baselines on these data sets.
Joint work with Philip Versteeg and Tineke Blom
Biography: Joris M. Mooij studied mathematics and physics and received his PhD degree with honors from the Radboud University Nijmegen (the Netherlands) in 2007. His PhD research concerned approximate inference in graphical models. During the next three years, he worked on causal discovery as a postdoc at the Max Planck Institute for Biological Cybernetics in Tübingen (Germany). In 2011 he obtained an NWO VENI grant, which allowed him to do a second postdoc, this time at the Radboud University Nijmegen. In 2013 he became Assistant Professor at the Informatics Institute of the University of Amsterdam (the Netherlands). In the next years, he obtained an NWO VIDI grant and an ERC Starting Grant, allowing him to start his own research group, consisting of 3 PhD students and 2 postdocs, focusing entirely on causal discovery. In 2017 he was promoted to Associate Professor. He has published several international peer-reviewed papers and has won several awards for his work.
David Jensen, DSc, Professor, College of Information and Computer Sciences, University of Massachusetts Amherst, “The Case for Empirical Evaluation of Methods for Causal Modeling” at 11:00 am on Thursday, February 15, 2018, in Rooms 407A/B BAUM, 5607 Baum Blvd., The Offices at Baum.
Abstract: A variety of methods have been developed for constructing causal models. These include methods for estimating the structure and parameters of causal graphical models, as well as a large number of methods for estimating individual causal dependencies (e.g., propensity score methods). The primary evidence for the effectiveness of these methods is based on either theoretical proofs or performance on synthetic data. In this talk, I review the state of this evidence, and argue that empirical evaluation is a virtual necessity for the field to progress. I show how the progress of non-causal modeling methods was transformed in the 1980s and 1990s by a focus on empirical evaluation. I describe a set of techniques for empirical evaluation of methods for causal modeling, including some novel data sets and evaluation techniques developed in my research group. Finally, I briefly survey several practical issues that are likely to arise if empirical evaluation becomes the norm, and how considering these issues could significantly advance the field of causal modeling.
Biography: Dr. Jensen is a Professor of Computer Science at the University of Massachusetts Amherst. His research focuses on methods to learn accurate causal models of large social, technological, and computational systems. He regularly serves on program committees for several conferences, including the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, the IEEE International Conference on Data Mining, the International Conference on Machine Learning, and the Conference on Uncertainty in Artificial Intelligence. He has served on the Board of Directors of the ACM Special Interest Group on Knowledge Discovery and Data Mining (2005-2013), the Defense Science Study Group (2006-2007), and DARPA’s Information Science and Technology Group (2007-2012). In 2011, he received the Outstanding Teacher Award from the UMass College of Natural Sciences. In 2017, one of his papers received the IEEE INFOCOM Test of Time Paper Award.
Jonas Almeida, PhD, Professor and Chief Technology Officer, Department of Biomedical Informatics, Stony Brook University (SUNY), “Data Science for Biomedical Informatics in the Planet of the Apps” at 11:00 am on Thursday, November 30, 2017, in Rooms 407A/B BAUM, 5607 Baum Blvd., at the Offices at Baum.
Abstract: As in all new fields of academic inquiry, Data Science starts with an identity crisis. So, what’s new about the way Data Science derives from Computer Science, Biostatistics and Genomic Atlases? Does the deployment of interoperable Data Spaces from Genomic Data Commons to HL7 FHIR, the commoditization of Cloud Computing, or the optimized classification with Machine Learning, fundamentally contribute to answering important questions? What about Precision Medicine, does Data Science even play a role in that translation beyond being a toolbox? This discussion will be illustrated with examples* of how Data Science already contributes to some of these endeavors, and how it could for many others, as it matures into a quantitative framework that is both pervasive and participated. It will be argued, and illustrated with published work, that Data Science opens a number of novel avenues in quantitative research that go beyond its immediate applications to the delivery of HealthCare. Bring your laptop if you want to try the examples as they are presented.
Biography: In January 2015, Dr. Almeida accepted the new position of Professor and Chief Technology Officer at the Biomedical Informatics Department of Stony Brook University (State University of NY, Long Island). This follows 4 years as the inaugural director of a new Division in Informatics in the Department of Pathology of the University of Alabama at Birmingham (UAB), and 5 years as Professor of Bioinformatics in the Division of Applied Mathematics of the University of Texas MDAnderson Cancer Center (2005-2010).
His current research interests are at the intersection of Semantic Web abstractions and distributed Cloud Computing approaches to Bioinformatics application development in the pervasive Web Platform. The use of computational statistics at the intersection of those two fields now gets a fancy new name, Big Data Science, which is also the focus of his educational and service activities. This research pulls together threads from past, and ongoing, work on mathematical modeling and machine learning for Medical Genomics, at a time when these fields are challenged by the increasingly data driven nature of modern Biomedical research. In his own work this has often focused on The Cancer Genome Atlas (TCGA), a Biomedical Big Data resource that enables, and requires, this new synthesis for the development of Personalized Medicine applications. As Population Health data becomes available in real-time (see for example http://bit.ly/pqiSuffolk), the opportunities for pursuing Machine Learning as a pervasive Web Computing exercise are emerging, with a new avenues for research in Artificial Intelligence applications embedded in the increasingly patient-facing Health-Care enterprise.
Professor and Willett Faculty Scholar, Department of Computer Science, University of Illinois, “Statistical Approaches to Analysis of Traditional Chinese Medicine Practice Records” at 11:00 am on Thursday, May 18, 2017, in Rooms 407A/B BAUM, 5607 Baum Blvd., The Offices at Baum.
Abstract: Traditional Chinese Medicine (TCM) can provide important complementary medical care to modern medicine, and is widely practiced in China and many other countries. Recently, TCM patient records have been digitalized, leading to a large number of online patient records. The data contains potentially valuable knowledge about diagnosis and treatment of various diseases using the TCM methodology and thus creates an interesting opportunity to apply data mining techniques to extract such knowledge. In this talk, I will present some of our recent work on using statistical approaches to analyze TCM patient records for disease profiling, disease subcategorization, and survival analysis. In disease profiling, we propose a new probabilistic model for the joint analysis of symptoms, diagnoses, and herbs in patient records to discover the typical symptoms and typical herbs associated with different diseases. In disease subcategorization, we study how to cluster patient records to discover subcategories of diseases and show that we can use machine learning to leverage the knowledge in a TCM dictionary of herb functions for improving the accuracy of subcategorization. In survival analysis, we cluster lung cancer patients and compare the survival time of different clusters of patients and show that integration of medical records with molecular interaction networks and TCM knowledge graph is effective for addressing the problem of missing data in the medical records. The experimental results on multiple TCM patient data sets show the benefit of integrating medical records with other biomedical knowledge bases and the promise of leveraging TCM patient records for improving precision medicine.
Biography: ChengXiang Zhai is a Professor of Computer Science and a Willett Faculty Scholar at the University of Illinois at Urbana-Champaign (UIUC), where he is also affiliated with the Institute for Genomic Biology, Department of Statistics, and School of Information Sciences. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests are in the general area of intelligent information systems, including specifically information retrieval, data mining, and their applications in biomedical and health informatics, and intelligent education systems. He has published over 200 papers in these areas with high citations, and a textbook on text data management and analysis. He is an Editor-in-Chief of Springer’s Information Retrieval Book Series and an Associate Editor of BMC Medical Informatics and Decision Making, and previously served as an Associate Editor of ACM Transactions on Information Systems, Associate Editor of Elsevier’s Information Processing and Management. He is an ACM Distinguished Scientist, and received a number of awards, including Association for Computing Machinery SIGIR Test of Time Paper Award (three times), the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Award, and UIUC Campus Award for Excellence in Graduate Student Mentoring.
Professor and Chair, Department of Statistics, University of Washington, “Nested Markov Models” at 11:00 am on Thursday, April 20, 2017, in Rooms 407A/B BAUM, 5607 Baum Blvd., The Offices at Baum.
Abstract: Directed acyclic graph (DAG) models may be characterized in several different ways: via a factorization, via d-separation or a local Markov property. It has been known for a long time that marginals of DAG models also imply equality constraints that are not conditional independences. The well-known ‘Verma constraint’ is an example.
In this talk, we will show that equality constraints of this type can be viewed as conditional independences in kernel objects obtained from joint distributions via a fixing operation that generalizes conditioning and marginalization. We use these constraints to define, a graphical model, called the “nested Markov model”, that is associated with acyclic directed mixed graphs (ADMGs).
Naturally associated with a DAG with latent variables, is an ADMG known as the “latent projection”. The nested Markov model associated with an ADMG is a (smooth) supermodel of the model associated with the original latent variable model. Nested Markov models thus constitute a natural class in which to perform causal model search.
This is joint work with Robin Evans (Oxford), James Robins (Harvard) and Ilya Shpitser (Johns Hopkins).
Biography: Dr. Richardson is Professor and Chair of the Department of Statistics. He is also an Adjunct Professor in the Departments of Economics and Electrical Engineering and a member of the eScience Steering Committee. He received his BA in Mathematics & Philosophy from the University of Oxford and his MS and PhD in Logic, Computation & Methodology from Carnegie Mellon University. He is a Fellow of the Center for Advanced Studies in the Behavioral Sciences at Stanford University. His research interests include Graphical Models and Causality.