Dr. Olivier Lichtarge is constantly pushing his research group hard to think of innovative ways to analyze the massive amount of data available to scientists to help solve complex problems in biomedical research.
In 2011, a presentation from IBM experts about their much buzzed about “supercomputer” Watson, and a call to action by the scientific community to tackle rapidly evolving big data analysis, sparked Lichtarge to dream of a new challenge for Watson that could help solve a major problem in science.
What if the advanced text-mining capabilities of Watson, a cognitive technology that is better able to process human language than a normal computer, could be applied to comb the droves of scientific papers (more than 23 million studies currently available online) in the public medical literature to help researchers decide on the best direction of their research based on a more complete view of what is known? Could Watson’s capabilities lead researchers to better questions and initiate studies with less risk of failure and greater chances of rewarding findings?
Inspired, Lichtarge quickly teamed up with data scientists at IBM to draft a research proposal to help secure funding to initiate the development of such a tool. A project of this magnitude would require a significant amount of money and time invested by the team, but the potential was transformative, Lichtarge said.
But well aware that federal sources were unlikely to fund any project without prior data to support feasibility, Lichtarge pursued other avenues. This led to a partnership with the McNair Medical Institute of the Robert and Janice McNair Foundation, which funded a majority of the project at $1.6 million, and enabled Lichtarge to launch the project.
Just two days after Lichtarge presented the idea to the McNair Medical Institute, the project was funded.
“This gift made all the difference,” Lichtarge explains, “Instead of slowly trying to bootstrap a complex project over years, we were able within a matter of months to assemble a team, set milestones, and produce initial designs, code, and even results.”
The McNair support was all-important. Now 20 months into the project, the team is able to apply Watson’s artificial intelligence software to comb thousands of scientific papers in a matter of minutes. The tool, called the Baylor Knowledge Integration Toolkit, or KnIT, is being trained in an ongoing process to identify biological terms, discern concepts and recognize connections that help formulate new study hypotheses.
In August 2014, Lichtarge and team took the first step in making this big idea a potential reality that could one day accelerate scientific discoveries at a much faster pace.
In a retrospective case study involving published data on p53, an important tumor suppressor protein, the team showed that KnIT accurately predicted the existence of proteins that modify p53 – proteins that were subsequently found to do just that. There are more than 70,000 studies published on p53. It is impossible for any one person or scientist to have knowledge on all these studies when deciding which way to focus their research in this area.
Lichtarge, director of the Center for Computational and Integrative Biomedical Research and The Cullen Foundation Endowed Chair at Baylor, and the principal investigator on the study, presented the study at the 20th annual Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining conference in New York City in August, the premier data mining conference.
The Defense Advanced Research Projects Agency or DARPA also supported a portion of the work.
Read more about the study.