One of the robots at the Broad Institute of Harvard/MIT used to screen hundreds of thousands of molecules for interesting properties.
We design and apply novel algorithms to solve medically important problems in the life sciences. Within this area, my core interest is in designing algorithms to help discover new types of medicines and new uses for existing medicines. This work is currently being conducted in the context of several key collaborations and includes:
high throughput screening analysis
machine learning and probabilistic inference
experimental design using economic and decision theory
Finding new uses for existing medicines—drug “repurposing” or “repositioning”—has emerged as an attractive strategy for rescuing stalled pharmaceutical projects, finding therapies for neglected diseases, and reducing the time, cost and risk of drug development. At the same time that repurposing emerged as a drug development strategy, the data from hundreds of high-throughput screens (HTS)—the type of data once locked away in proprietary industry databases—became freely available to all. Small-molecule HTS, however, often misses active molecules because HTS projects are noisy and too narrow. Furthermore, it is difficult to mine public HTS repositories because HTS data is unorganized and its structure undocumented. In order to inform repurposing efforts with information from public screening data, we aim to reduce the number of missed active molecules in small-molecule HTS projects, to organize public HTS data by inferring and displaying their workflows, to identify known drugs likely to be active in the assays of public HTS projects, and to identify the pathways that modulate medically relevant phenotypes.
This effort is significant both because it improves future NIH screening projects and because it finds value missed in deposited projects. Two cases demonstrate each of these points. In the first case, our methods can be directly applied by a NIH screening center. For example, the Broad Institute intends to use our methods at the cherry-pick step, when molecules are selected from the initial screen to be sent for confirmatory testing. This will increase the total number of useful molecules identified in their screening projects. In the second case, independently of the original screeners, we can identify molecules that are likely active in a publicly available screen but missed in the initial analysis. For just the cost of obtaining the molecules, we can extend the work of the original screeners for a fraction of the cost of running the original screen.
Furthermore, this effort is significant because it coherently organizes public screening projects, enabling interpretation and mining of their data. Although several hundred screens are publicly available through PubChem and other repositories, their data is difficult to understand because the sequence of assays and filters—the project’s workflow—is not systematically documented or displayed. Some projects include more than twenty assays and are very difficult to organize and interpret by hand. What should be simple tasks—like recognizing that a set of assays are run in parallel on the same set of molecules—become difficult barriers to understanding the logical design of a project. It is impossible to effectively interpret or mine a project’s results without knowing its design. Therefore, tools for extracting project workflows directly from screening data increases the value of public repositories like PubChem and ChEMBL, overcoming a substantial barrier to mining them.
This work is also applied to identify known drugs likely to be active in medically-relevant assays. Each connection between a drug and an assay suggests a new use for that drug. This work is significant because it can rapidly and inexpensively inspire work that can lead to new treatments for patients, even for those with rare and neglected diseases. Our long-range goal is to validate the most plausible hypotheses with appropriate collaborators. Through both academic and industry collaborations, we are exceptionally well positioned to conduct appropriate repurposing studies. In addition to the large community of biologists and medical researchers at Washington University, our lab has successfully established collaborations with two pharmaceutical companies—Pfizer and Glaxo-Smith-Kline—with the aim to repurpose their molecules using our approaches.
In some cases, the initial experiment will be to test a drug in the assay used in the original HTS project. Animal studies may then be appropriate. In other cases, the corroborating literature may provide enough evidence to either initiate human trials or follow relevant biomarkers in an existing trial. Of course, appropriate selection of the best predictions to pursue also requires careful consideration of several complicated factors including marketability, side-effects, alternative treatments, intellectual property, disease course, and safety. We will treat these concerns in the context of specific hypotheses.
Research
We design and apply novel algorithms to solve medically important problems in the life sciences. Within this area, my core interest is in designing algorithms to help discover new types of medicines and new uses for existing medicines. This work is currently being conducted in the context of several key collaborations and includes:
Furthermore, this effort is significant because it coherently organizes public screening projects, enabling interpretation and mining of their data. Although several hundred screens are publicly available through PubChem and other repositories, their data is difficult to understand because the sequence of assays and filters—the project’s workflow—is not systematically documented or displayed. Some projects include more than twenty assays and are very difficult to organize and interpret by hand. What should be simple tasks—like recognizing that a set of assays are run in parallel on the same set of molecules—become difficult barriers to understanding the logical design of a project. It is impossible to effectively interpret or mine a project’s results without knowing its design. Therefore, tools for extracting project workflows directly from screening data increases the value of public repositories like PubChem and ChEMBL, overcoming a substantial barrier to mining them.
This work is also applied to identify known drugs likely to be active in medically-relevant assays. Each connection between a drug and an assay suggests a new use for that drug. This work is significant because it can rapidly and inexpensively inspire work that can lead to new treatments for patients, even for those with rare and neglected diseases. Our long-range goal is to validate the most plausible hypotheses with appropriate collaborators. Through both academic and industry collaborations, we are exceptionally well positioned to conduct appropriate repurposing studies. In addition to the large community of biologists and medical researchers at Washington University, our lab has successfully established collaborations with two pharmaceutical companies—Pfizer and Glaxo-Smith-Kline—with the aim to repurpose their molecules using our approaches.
In some cases, the initial experiment will be to test a drug in the assay used in the original HTS project. Animal studies may then be appropriate. In other cases, the corroborating literature may provide enough evidence to either initiate human trials or follow relevant biomarkers in an existing trial. Of course, appropriate selection of the best predictions to pursue also requires careful consideration of several complicated factors including marketability, side-effects, alternative treatments, intellectual property, disease course, and safety. We will treat these concerns in the context of specific hypotheses.