CPH faculty and students have access to data and computing environments from both universities, with the option to choose the best combination of resources for each project.
UCSF and UC Berkeley jointly provide access to a wealth of analysis-ready clinical and health data, including detailed electronic health record (EHR) data from all patients seen at UCSF and de-identified data from patients seen at any of the five University of California medical campuses. A good introduction to working with UCSF clinical data is available at UCSF Data Resources. Access to de-identified data does not require Institutional Review Board (IRB) approval. Access to P4-level fully identified data requires requesters to have their own project-specific IRB approval.
- De-Identified UCSF Health Data
Major de-identified data resources include the Clinical Data Warehouse (CDW) (login required) that includes clinical, financial, and utilization data, and Information Commons, a cloud-based repository of CDW data plus machine-redacted clinical notes and extracted concepts, and images.
- Identified UCSF Health data
Access to identified data requires a consultation and IRB approval. Available identified data include data from Clarity, the CDW [link], and Information Commons (OMOP format).
- UC Health Data Warehouse
The UC-wide Health Data Warehouse provides access to deidentified electronic health record data from over 7 million patients seen at any of University of California five medical centers since 2012 (San Francisco, Davis, Los Angeles, San Diego, Irvine). Data is harmonized and in the OMOP format.
- Other Data Sets
Additional imaging, population health, health insurance data and other data are available through UCSF here.
Secure Research Data and Compute (SRDC)
Berkeley Research IT operates a suite of computational platforms and services designed to meet the needs of researchers from across the academic spectrum, including bio/medical research. The Secure Research Data and Compute (SRDC) platform ranges from private Cloud virtual machines (VMs) to high performance computing (HPC). The virtual machine environment and the HPC cluster share a high performance, parallel file storage system facilitating computation using both platforms, as needed.
Researchers have access to consulting and design expertise related to issues of data security and privacy, data architecture and data management, performance tuning and optimization, etc.
Analytics Environments on Demand (AEoD) / Savio
Virtual machines and high performance computing are also offered at a lower security level, for researchers who are working with less sensitive data. These services are referred to as the Analytics Environments on Demand (AEoD) Service, and the Savio High Performance Computing cluster.
Wynton and Information Commons
UCSF Wynton is a high-performance computing architecture sponsored by the Bakar Computational Health Sciences Institute (BCHSI) and other campus groups on campus. The current size and specifications of the CPU and GPU nodes are available at wynton.ucsf.edu under the About tab.
The Information Commons Shared Cluster is an AWS EMR cluster based on a Spark AI environment with tools for analyzing multi-factor and multi-modal UCSF clinical data as described above. Additional tools are planned to be added in the future, as we launch support for imaging exploration (a.k.a. Imaging Commons), molecular ‘omics (a.k.a. Omics Commons), and new clinical text analysis tools as they become available.
Computing resources are also available separately through collaborative access to computing resources at National Laboratories, including the San Diego Supercomputer Center and the Lawrence Berkeley National Laboratory.
Contact at Info.Commons@ucsf.edu for more details.
APeX Enabled Research (AER)
UCSF Health uses the Epic(r) electronic medical record, which we call APeX. If you have an algorithm, digital tool or intervention that has strong evidence of feasibility and acceptability and you want to study this tool within APeX, you must consult APeX Enabled Research (AER) for assistance and approval.
Examples of projects that are overseen by AER include:
- A digital intervention consisting of a predictive algorithm and a clinical decision support tool to encourage sleep promotion order sets
- An external digital visualization tool that helps graph neonatal weight loss
- Your research plan includes use of alerts, order sets, notes, in-box messages or other tools within APeX
Integration with live clinical systems is highly complex and requires multiple layers of governance and approvals. Additional oversight and monitoring processes are being developed for AI/ML solutions being used in live clinical care.
COMMUNITIES OF PRACTICE
NLP@UCSF brings together learners, users, and doers to share and discuss applications of natural language processing (NLP) to healthcare, clinical, and biomedical research.