The Center for Causal Discovery is working together with other BD2K Centers to promote novel methods to analyze Big Data and to explore interoperability with tools and software developed by other Centers.
The Center for Causal Discovery (CCD) and the University of Puerto Rico – Rio Piedras (UPR) are collaborating on BD2K educational and research initiatives. These activities are funded by a BD2K R25 program to UPR and a BD2K supplement to the CCD. The R25 Principal Investigators (PIs) are Dr. Jose Garcia-Arraras, Dr. Patti Ordonez, and Maria Eglee Perez-Hernandez, who are from UPR Natural Sciences. Additional participating R25 faculty include Dr. Luis Pericchi Guerra and Dr. Humberto Ortiz from UPR, and Dr. Rafael Irizarry from Harvard University. CCD faculty leadership in this collaboration include Dr. David Boone, Dr. Joseph Ayoob, Dr. Richard Scheines, and Dr. Gregory Cooper (PI).
We jointly created a summer internship program which began in 2016. Six undergraduate students from UPR participated in 10-week long research internships in Pittsburgh on projects related to the analysis of big biomedical data. The students conducted cutting-edge research under the guidance of CCD faculty. These students heard lectures on big data topics including machine learning, advanced statistics, cancer genomics. They also participated in journal clubs discussing primary research papers. The students joined roundtable discussions with graduate students and postdocs discussing their work and attended various career development workshops. At the conclusion of the internship, each student presented a talk and a poster about their research. The CCD has accepted six new UPR undergraduate students into the 2017 summer internship program.
Faculty from the CCD project have traveled to UPR to present BD2K research talks and to meet with faculty and students there to discuss research and educational programs. Similarly, faculty from UPR have visited Pittsburgh to participate in workshops and discuss research. This year, we intend to collaborate with UPR faculty on the design BD2K curriculum materials for use at UPR. We also plan to engage further with UPR faculty on joint research related to causal modeling and discovery.
PIC-SURE – A Proof-of-principle Federated Data Ecosystem for Big Biomedical Data Storage and Analysis in the Cloud:
We are currently developing a system that shares data and computational resources in a cloud environment (or other externally hosted system). With these services integrated, researchers will have the data framework through which to better understand the causes of and gain insight into possible treatments for various diseases and conditions. We will focus on enabling authenticated access to secure data and analysis modules across institutional boundaries, a prerequisite to productive collaborations among institutions that host large biomedical data repositories and computational infrastructures. The PIC-SURE team is developing an Application-Programming Interface (API) that will allow the CCD team to access the Autism Simons Simplex Collection (SCC) dataset through simple REpresentational State Transfer (RESTful) calls to obtain data for causal analysis. The proof-of-principle federated data ecosystem that will be developed can serve as a model for how other institutions share biomedical data and analytic services in a secure and scalable manner in a cloud environment.