Researchers receive funding to facilitate translation of biomedical data

Big data has become a big deal.

Advanced computing technology has enabled the collection of huge amounts of data on topics ranging from weather and traffic patterns to human health and disease, but translating all of that data into usable knowledge is a major challenge.

To meet this challenge in the biomedical field, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative in 2012. The trans-NIH initiative was established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement.

The University of Delaware’s Cathy Wu and collaborator Vijay Shanker are part of a team led by Georgetown University that received $1.4 million in BD2K funding in 2015.

Now, Wu, the Unidel Edward G. Jefferson Chair of Bioinformatics and Computational Biology, has received two new grants from NIH to continue building the capacity to use big data for precision medicine, which is aimed at defining disease-driving pathways in individual patients and designing targeted therapies that have maximal efficacy with minimal side effects.

One of the projects brings a computational linguistics approach to translate free texts in scientific literature into knowledge bases that can be queried by users, while the other is aimed at cataloguing molecular signatures for understanding the cellular consequences of genetic and environmental perturbations, including drug responses.

“One of the major challenges in extracting information from scientific texts is that authors often write the same information in many different ways and using different styles,” says Shanker, who is partnering with Wu on the first grant. “We need to capture crucial information in a structured f0rm so that other researchers and clinicians can access it easily, while informaticians can manipulate it programmatically.”

Wu explains that there is a great deal of information in the literature about how genomic mutations affect drug response and disease development, but extracting this information is not easy.

“But once it’s organized into a database, we can ask questions like ‘How does this mutation affect the response to a particular cancer drug?’” she says. “That’s what precision, or personalized medicine is all about — the more we know about people’s genetic makeup and about how their genes affect the efficacy of a particular treatment, the more targeted their therapy can be.”

Previous systems built by Wu and others generally target one specific type of information, but the approach Wu and Shanker are taking will provide a general framework that will allow new systems to be built quickly around the generation of new data on a broad range of topics.

The second project, led by researchers at Mount Sinai, is funded through the BD2K Data Coordination and Integration Center, which facilitates the broad use of biomedical digital assets generated by the LINCS (Library of Integrated Network-Based Cellular Signatures) initiative by making them “FAIR” (findable, accessible, interoperable, and reusable) so that others can use them to make new discoveries.

Wu and her team are currently mining the data to determine why kinase inhibitors, which are seeing increased use in cancer treatment, work for some patients and not others.

“We’re a small part of a large national initiative, but with everyone working together, we can have a big impact,” Wu says. “This work is allowing us to start overlaying data from patients on data from the scientific literature to answer questions about the development and treatment of disease.”

About the grants and the researchers

Cathy Wu is the Unidel Edward G. Jefferson Chair of Bioinformatics and Computational Biology and director of UD’s Center for Bioinformatics and Computational Biology. She is a professor in the Department of Computer and Information Sciences and the Department of Biological Sciences. Vijay Shanker is a professor in the Department of Computer and Information Sciences.

“Semantic Literature Annotation and Integrative Panomics Analysis for PTM-Disease Knowledge Network Discovery” (grant No. U01GM120953-01) totals approximately $1.1 million and runs from Aug. 5, 2016, to July 31, 2019.

“BD2K-LINCS DCIC eDSR Data Science Research: Collaborative Resource for LINCS Panomics PTM Knowledge Network” (grant No. 5U54HL127624-03) runs from May 1, 2016, to April 30, 2018.

Wu is principal investigator of the UD site, which is funded at $260,000. The multi-institutional grant totals $8 million.