Informatics as a Public Good

Digital innovations are transforming health care and research. Some of these innovations originate from research funded by federal agencies and non-profit organizations. Other innovations originate in the commercial sector and are overwhelmingly built and scaled as for-profit companies. However, some informatics technologies such as data, vocabulary, and interface standards are better thought of as public goods, which commercial markets typically undersupply. As ever more data are needed to drive artificial intelligence and machine learning, informatics standards are becoming increasingly important for breaking down silos and fostering more data sharing and integrated solutions.

This talk will describe the technical and policy landscape of health data sharing in the United States, and will discuss how two non-profit organizations, Open mHealth and Vivli, are advancing informatics-powered data sharing through public goods approaches.

Ida Sim, MD, PhD is a primary care physician, informatics researcher, and entrepreneur. She is a Professor of Medicine and UCSF Director of the UCSF UC Berkeley Joint Program in Computational Precision Health. Her other UCSF positions include Director of Digital Health for the Division of General Internal Medicine and Co-Director, Informatics and Research Innovation at UCSF’s Clinical and Translational Sciences Institute. Dr. Sim is a global leader in the technology and policy of large-scale health data sharing.

Dr. Sim is a co-founder of Open mHealth, a non-profit organization that is breaking down barriers to mobile health app and data integration through an open software architecture. Open mHealth is an IEEE family of global standards. IEEE 1752.1 was officially approved in 2021. Dr. Sim has multiple grants from NIH, NSF, and AHRQ on mobile health methodology and digital health for primary care. In 2019, she co-developed CommonHealth, an open source software suite bringing to the Android ecosystem the equivalent of Apple Health’s ability to access and share EHR data.

Expanding the Paradigm of Microbial Genome Annotation

The standard paradigm for computationally analyzing a newly sequenced microbial genome is to assemble sequencer reads into longer contigs, to use gene-finding programs to identify the locations of genes within those contigs, and to use sequence-similarity searches and HMM models to assign gene functions. In recent years that paradigm has been extended in multiple respects through the development of additional inference tools to extract additional information from the genome: metabolic reconstruction techniques predict the qualitative metabolic network of the organism, and generate a quantitative metabolic model for the organism. Pathway hole filling and reaction gap filling identify genes coding for missing pathway enzymes, and identify missing metabolic reactions. We also present tools for inferring transport reactions and protein complexes.

The preceding inference tools are available within the Pathway Tools software suite, and can be applied to any newly sequenced genome. We have processed 20,000 genomes using these tools to create the BioCyc collection of Pathway/Genome Databases. The BioCyc website provides an extensive set of bioinformatics tools for searching and analyzing these databases, and leveraging them for analysis of omics datasets. Genome-related tools include a genome browser, sequence searching and alignment, and extraction of sequence regions. Pathway-related tools include pathway diagrams, a tool for navigating zoomable organism-specific metabolic map diagrams, and a tool for searching for metabolic routes that connect metabolites of interest. Regulation tools depict operons and regulatory sites, as well as showing full organism regulatory networks. Comparative analysis tools enable comparisons of genome organization, of orthologs, and of pathway complements. Omics data analysis tools support enrichment analysis and painting of transcriptomics and metabolomics data onto individual pathway diagrams and onto zoomable metabolic map diagrams. A new Omics Dashboard tool enables interactive exploration of omics datasets through a hierarchy of cellular systems.

Peter D. Karp is the director of the Bioinformatics Research Group within the Artificial Intelligence Center at SRI International. Dr. Karp has authored 190 publications in bioinformatics and computer science in areas including metabolic pathway bioinformatics, computational genomics, scientific visualization, and biological databases. Karp developed the Pathway Tools software, the EcoCyc and MetaCyc databases, and the BioCyc database collection. He is a Fellow of the American Association for the Advancement of Science and of the International Society for Computational Biology. He received the Ph.D. degree in Computer Science from Stanford University in 1989, and was a postdoctoral fellow at the NIH National Center for Biotechnology Information.