March 28, 2016 @ 12:00 pm – 1:00 pm
Hamburg Hall 1502, CMU

Susan Athey, PhD
Stanford University

This talk will cover a pair of papers that modify popular machine learning methods for application to estimating how treatment effects vary with observable unit characteristics. Applications include large scale field experiments or online A/B tests. The first paper develops ?causal trees,? a modification of regression trees. The standard algorithm is modified so that its criteria focus on heterogeneous treatment effects rather than predicting outcomes; this requires confronting the fact that for causal inference, the “ground truth?? is not observed, and so the analog of a mean-squared error criterion is infeasible. Further, we make use of “honest?? estimation, whereby one sample is used to estimate the tree structure and a second is used to estimate treatment effects and calculate standard errors. The criteria for model selection and cross-validation are modified to anticipate that estimation will be honest and thus unbiased, but variance remains a concern. Our method can generate estimates of heterogeneous treatment effects and provide valid confidence intervals without any restrictions on the number of covariates relative to the number of observations or the richness of the data-generating process. The second paper proposes the first asymptotic normality results for random forests, one of the most popular and successful prediction methods. We then extend the results to estimate heterogeneous treatment effects in experimental or observational studies, and provide a consistent estimator for the standard error. The method creates a forest using, for example, causal trees. Using simulations we show that our approach can provide a substantial improvement (in terms of both mean-squared error and confidence interval coverage) over existing non-parametric methods for which asymptotic theory was available (e.g. kernel or K-nearest neighbor matching).