# Wikisource:WikiProject Open Access/Programmatic import from PubMed Central/Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases

## Abstract

There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robot Scientist ‘Eve’ designed to make drug discovery more economical. A Robot Scientist is a laboratory automation system that uses artificial intelligence (AI) techniques to discover scientific knowledge through cycles of experimentation. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling we demonstrate that the use of AI to select compounds economically outperforms standard drug screening. For further efficiency Eve uses a standardized form of assay to compute Boolean functions of compound properties. These assays can be quickly and cheaply engineered using synthetic biology, enabling more targets to be assayed for a given budget. Eve has repositioned several drugs against specific targets in parasites that cause tropical diseases. One validated discovery is that the anti-cancer compound TNP-470 is a potent inhibitor of dihydrofolate reductase from the malaria-causing parasite Plasmodium vivax.

## Introduction

1.

### Drug screening

4.2.

We then tested the utility of these assays, and the efficiency of Eve at standard screening, i.e. running in its library-screening and hit-confirmation modes (table 1). We ran the Maybridge Hitfinder library of approximately 14 400 chemically diverse compounds to these assays. This identified numerous hits. A subset of these results were reported in[22].

#### Table 1.

"The targets (disease/species/protein/drug-resistant) and libraries screened (May, Maybridge Hitfinder; JH, Johns Hopkins University Clinical Compound Library)."
diseasespeciesenzymedrug-resistantlibraries
malariaP. falciparumDHFRnoMay, JH
malariaP. falciparumDHFRyesMay, JH
malariaP. falciparumDHFRnoMay, JH
malariaP. vivaxDHFRnoMay, JH
malariaP. vivaxDHFRyesMay, JH
malariaP. vivaxDHFRnoMay, JH
malariaP. vivaxPGKnoMay, JH
malariaP. vivaxNMTnoMay, JH
ChagasT. cruziDHFRnoMay, JH
ChagasT. cruziPGKnoMay, JH
ChagasT. cruziNMTnoMay, JH
African sleeping sicknessT. bruceiDHFRnoMay, JH
African sleeping sicknessT. bruceiPGKnoMay, JH
African sleeping sicknessT. bruceiNMTnoMay, JH
schistosomiasisS. mansoniDHFRnoMay, JH
schistosomiasisS. mansoniPGKnoMay, JH
schistosomiasisS. mansoniNMTnoMay, JH
leishmaniasisL. majorDHFRnoMay, JH
bacterial infectionS. aureusDHFRnoMay, JH

### Drug screening for drug repositioning

4.3.

We then applied the assays to the challenge of drug repositioning—the application of known drugs to new diseases (table 1). To do this, we again used Eve in its library-screening and hit-confirmation modes to screen and confirm hits for the above assays, but using the Johns Hopkins University Clinical Compound Library that contains approximately 1600 FDA-and foreign-approved drugs. Several repositioned compounds were found that discriminate between host and parasite, and have passed initial cytotoxicity tests. To maximize the utility and reuse of these screening data, they are available as open data in Resource Description Framework (RDF) format[24] (electronic supplementary material).

### Repositioning TNP-470 as an anti-malaria compound

4.4.

The compound TNP-470 was derived from the antimicrobial compound fumagillin (figure 4). TNP-470 is an angiogenesis inhibitor (mediated by its irreversible binding to methionine aminopeptidase-2 (MetAP2)) that has been investigated as an anti-cancer drug. TNP-470 and its analogues have been shown to bind to P. falciparum MetAP2 in vitro, to inhibit growth of P. falciparum strains (including the chloroquine-resistant strains W2 and C2B), and to inhibit parasitaemia in a mouse model[25][26]. Eve's yeast synthetic biology assay results indicate that TNP-470 has high activity against P. vivax DHFR (figure 5). To further confirm that DHFR is an additional target of TNP-470 we performed DHFR enzyme inhibition assays[27]. We observed that P. vivax DHFR was 1000-fold more sensitive to TNP-470 than its human counterpart; the drug's IC50 for the parasite enzyme being 0.16 µM, compared to more than 165 µM for human DHFR. This is consistent with the results of Eve's assays and suggests that our approach identified a bona fide DHFR inhibitor with improved selectivity.

The structure of TNP-470. (Online version in colour.)

An Eve hit-confirmation run with four replicates. TNP-470 dose response curves for: yHsDHFRp (red), yPfDHFRp (green) and yPvDHFRp (blue). Normalized growth is calculated by comparison to in-plate negative controls.

DHFR inhibitors are currently routinely used as prophylactics against malaria and are given to over a million children in seasonal malaria chemoprevention. However, DHFR inhibitors are no longer used as a standard treatment because of the evolution of drug resistance[6]. Extensive efforts to discover a second-generation DHFR-targeted anti-malarial drug with efficacy against pyrimethamine-resistant strains have yet to produce a compound that has passed clinical trials [28]. Therefore, the discovery of an approved compound with activity against DHFR is of high potential value. It is also significant that TNP-470 is an example of ‘polypharmacology’ [29], in that it targets both Plasmodium DHFR and MetAP2. This means that it should be pre-hardened to the evolution of drug resistance, as this would require simultaneous alteration of both targets.

## Automating drug development

5.

### Automating drug development

5.1.

We integrated all three of Eve's modes (library-screening, hit-confirmation, intelligent screening) together to demonstrate that early stage drug development can be automated, including QSAR generate-and-test cycles. The division of labour between Eve and the human scientists and technicians was as follows: the problem task was first tightly defined by the humans who engineered the assays, and defined the QSAR problem. This was the extent of human intellectual effort. Human manual effort was required to maintain and run Eve, maintain consumables, yeast stocks, etc. Human manual effort was also required to run certain programs during the different stages of the cycles, as some of the steps are not fully integrated; these program steps are predetermined, and could if necessary be fully automated.

The first full experimental tests of the active learning loop were conducted by splitting the screened data comprising the heterologous DHFR yeast strains for P. falciparum, P. vivax, and that of humans, using 4800 compounds as a training set. The ratio of the yields of the HsDHFR and PvDHFR and PfDHFR strains were passed to the selection algorithm, together with fingerprints of the remaining 9600 compounds. The results from the first ‘cherry-picking’ round (compounds selected by active learning and using the hit-confirmation assays) (n = 96; 12 plates of eight compounds per plate; eight replicates of six concentrations) were then added to the original dataset, and a second cherry-picking round conducted. We used these data to evaluate different approaches to the problem of combining cherry-picking and mass-screening data. The approach based on using the mean of replicates multiplied by log(10/conc.) was found to perform best. We then ran the active learning loop through three iterations: an initial set of 4800 compounds was screened (single iteration, 10 µM), and three loops of 96 cherry-picked compounds (eight replicates, at a range of concentrations) were selected. The mean log-weighted cherry-picking data was cycled back into the training set.

### Econometric modelling

5.2.

A thorough investigation of Eve's QSAR active learning methods, comparing intelligent screening versus standard brute-force screening, requires the analysis of thousands of cycles. We therefore decided to use our empirical results from using Eve (in Library-screening mode) against the complete set of 14 400 compounds of the Maybridge HitFinder library against DHFR assays from multiple parasitic organisms (see above)—we considered the Johns Hopkin's library to be too small for intelligent screening. The idea was to use these results as an oracle—instead of new physical experiments. One refinement that we did not investigate was the role of ordering of compounds in the library: we used a constant random order. It would have been interesting to investigate the use molecular of diversity measures to order compounds for screening, this would be expected to find hits faster than random screening. In cases where the target has a known structure, it would have been interesting to investigate in silico screening to order compounds as likely hits.

To quantify the utility of intelligent screening, we developed an econometric model (figure 6). In this model the net utility is the cost saving due to not screening compounds, minus the cost due to missing any hits, minus the cumulative cost of the number of active learning cycles performed. Active learning was applied to the seed input data, and predictions made to produce simulated learning curves. The progression of these learning curves was then compared to the base case of standard library screening. For each 96-compound loop, the utility equation was applied. Figure 7 shows the result of one such run involving many cycles of learning and demonstrates hit enrichment by intelligent screening.

Modelling the economics of drug discovery. The econometric model of the differential utility of intelligent screening versus mass screening with hit-confirmation.

Intelligent versus Random Screening. An example simulation run of intelligent screening: cycles of QSAR learning/testing from a compound library. The data are taken for a screen of the Maybridge Hitfinder library against the P. vivax DHFR as target (electronic supplementary material). Intelligent screening is red and standard brute-force black. The differential utility of intelligent screening is shown in blue. It can be seen that it is cost-optimal to screen between a third and a half of Eve's small library, with a larger library the screened proportion would be expected to be smaller. Similar diagrams for the other targets can be found in the electronic supplementary material.

We used the model to investigate a range of costings to determine under what conditions it is economically advantageous compared with performing a standard whole-library screen. Figure 8 shows that under most conditions it is economically rational to screen intelligently. Assuming that the probability of a compound being a hit is independent of the size of the library, i.e. they are independent and identically distributed variables (iid), then the utility gained from intelligent screening is proportional to the size of the library—larger libraries produce larger savings. The iid assumption is reasonable and, in large part, the motivation for the collation of the very large libraries currently used for screening. However, it is also conservative, as the difficulties in physically creating structurally diverse libraries means that the probability of an individual compound being a novel structural hit probably decreases with the size of the library, which means that the savings are probably much greater for large libraries. Therefore, intelligent screening is more cost-effective with larger libraries, more valuable compounds and fast cycles of assay screening and testing—this is the standard regime for pharmaceutical screening, suggesting that adoption of intelligent screening is economically rational.

Summary of utility modelling. Diagram of the maximum utility of intelligent screening taken from a systematic scan of different costs/utilities in the econometric model (a), using the screening results in (b). To make these results comprehensible, we project them down into a three-dimensional graph and combine cost/utilities: time ratio = Tc/Tm and cost ratio = Uh/Cc. This indicates that intelligent screening is generally rational (there is little area of negative utility), and that a high time-ratio (fast screening) and low cost-ratio (valuable library compounds) are most favourable.

## Data and code

6.

We developed a semantic data model of Eve's-screening assay results (see electronic supplementary material), where the root node ‘assay triple screen’ represents the main group of data items used to analyse the results. This root node is linked to the node ‘Eve’ via the relation ro:has-agent. The semantics of this association are that Eve initiates and runs the process ‘assay triple screen’. The assay triple screen process has the following inputs (ro: has-input): synthetic yeast strain(s), each has a unique identifier and ro:has-part fluorophore and DHFR target; compound is represented by SMILES code and sio:has-identifier compound common name and Maybridge hit finder ID; plate is represented by a code and ro:has-part well-column and well-row to identify each well. The semantics of these associations are that synthetic yeast strains, compounds and a plate participate in the assay triple screen process and are present at the beginning of the process. The assay triple screen process has the following outputs (ro: has-output): venus, sapphire and cherry initial fluorescence in a well; venus, sapphire and cherry final fluorescence in a well; venus, sapphire and cherry doubling time in a well; venus, sapphire and cherry lagtime2 in a well; venus, sapphire and cherry error code in a well. The semantics of these associations are that initial and final fluorescence, doubling time, lagtime2 and error code measurements were produced by the assay process and are present at the end of the process. Additionally, the relation has-target-origin was introduced to link a target and an organism of origin. We included this relation and other entities that are required to define semantic meaning of Eve data in a small ontology EVE that was specially designed to support the semantic data model of Eve's-screening assay results (http://disc.brunel.ac.uk/eve.). The node ‘DHFR target’ is linked via this relation to the host (Homo sapiens) and parasites. A target may be drug-resistant. This is expressed via the link sio:has-quality. The dataset is deposited at http://disc.brunel.ac.uk/eve-dataset.

To facilitate the reuse of the code, we have placed all the software: Eve low-level control software, QSAR software and active learning software on GitHub using the GNU General Public License v. 3 (https://github.com/RobotEve/RobotEve).

## Discussion and conclusion

7.

Eve's standardized assays could easily be engineered for other targets classes or target species (e.g. bacteria), for adjunctive targets (e.g. to drug import or efflux pumps) or for combinatory functions (e.g. to screen for drug synergies across multiple targets). In addition, the biological realism of the assays could be increased by the incorporation of multiple parasite targets within that same yeast cell, creating increasingly parasite-mimetic and human-mimetic cells. The assays could also be modified to be much faster—as using growth as the read-out limits the speed of executing the assay.

The economics of drug development are influenced by many factors[1][4] some technical (understanding how to intervene to treat a disease, the difficulty of achieving the intervention, etc.), others societal (safety standards, the drug pricing, etc.). Although the costs of drug discovery are substantial, they are relatively small compared with later stages in development. Such arguments tell against increased automation and standardization in drug discovery making much economic difference. However, they fail to take into account the ‘art of the soluble’ (Sir Peter Medawar). Preventing drug failures in late-stage development is an intrinsically very hard problem, as human biology is very complex. By contrast, we argue that a radical decrease in the cost and increase in the speed of drug discovery could be achieved by the full automation and standardization of procedures. By this, we mean a robotic system that once given a target could autonomously develop a standardized assay for that target, screen a compound library using that assay, confirm hit compounds and identify lead compounds through cycles of QSAR learning and testing. This could be achieved today: Eve's synthetic biology assays could be automated using existing technology, and chemical synthesis machines exist that could be integrated with Eve [11]. Such integration would achieve the goal of a robotic system that could autonomously generate hits for targets, and radically decrease the cost and increase the speed of drug discovery.

## Acknowledgements

We would like to thank Mehedi Nahian for help in the conversion of the experimental data to R.D.F.

## Funding statement

This work was supported by grant BB/F008228/1 from the UK Biotechnology and Biological Sciences Research Council and a contract from the European Commission under the FP7 Collaborative Programme, UNICELLSYS, both to S.G.O. and R.D.K. K.D.G. and J.R. were supported partially by KU Leuven GOA/08/008 and partially by ERC Starting Grant 240186.

## References

1. 1 Cite error: Invalid <ref> tag; name "RSIF20141289C1" defined multiple times with different content
2. 2 Cite error: Invalid <ref> tag; name "RSIF20141289C2" defined multiple times with different content
3. 3 Cite error: Invalid <ref> tag; name "RSIF20141289C3" defined multiple times with different content
4. 4 Cite error: Invalid <ref> tag; name "RSIF20141289C4" defined multiple times with different content
5. 5 Cite error: Invalid <ref> tag; name "RSIF20141289C5" defined multiple times with different content
6. 6 Cite error: Invalid <ref> tag; name "RSIF20141289C6" defined multiple times with different content
7. 7 Cite error: Invalid <ref> tag; name "RSIF20141289C7" defined multiple times with different content
8. 8 Cite error: Invalid <ref> tag; name "RSIF20141289C8" defined multiple times with different content
9. 9 Cite error: Invalid <ref> tag; name "RSIF20141289C9" defined multiple times with different content
10. 10 Cite error: Invalid <ref> tag; name "RSIF20141289C10" defined multiple times with different content
11. 11 Cite error: Invalid <ref> tag; name "RSIF20141289C11" defined multiple times with different content
12. 12 Cite error: Invalid <ref> tag; name "RSIF20141289C12" defined multiple times with different content
13. 13 Cite error: Invalid <ref> tag; name "RSIF20141289C13" defined multiple times with different content
14. 14 Cite error: Invalid <ref> tag; name "RSIF20141289C14" defined multiple times with different content
15. 15 Cite error: Invalid <ref> tag; name "RSIF20141289C15" defined multiple times with different content
16. 16 Cite error: Invalid <ref> tag; name "RSIF20141289C16" defined multiple times with different content
17. 17 Cite error: Invalid <ref> tag; name "RSIF20141289C17" defined multiple times with different content
18. 18 Cite error: Invalid <ref> tag; name "RSIF20141289C18" defined multiple times with different content
19. 19 Cite error: Invalid <ref> tag; name "RSIF20141289C19" defined multiple times with different content
20. 20 Cite error: Invalid <ref> tag; name "RSIF20141289C20" defined multiple times with different content
21. 21 Cite error: Invalid <ref> tag; name "RSIF20141289C21" defined multiple times with different content
22. 23 Cite error: Invalid <ref> tag; name "RSIF20141289C23" defined multiple times with different content
23. 24 Cite error: Invalid <ref> tag; name "RSIF20141289C24" defined multiple times with different content
24. 25 Cite error: Invalid <ref> tag; name "RSIF20141289C25" defined multiple times with different content
25. 26 Cite error: Invalid <ref> tag; name "RSIF20141289C26" defined multiple times with different content
26. 28 Cite error: Invalid <ref> tag; name "RSIF20141289C28" defined multiple times with different content
27. 29 Cite error: Invalid <ref> tag; name "RSIF20141289C29" defined multiple times with different content
28. 30 Cite error: Invalid <ref> tag; name "RSIF20141289C30" defined multiple times with different content
29. 31 Cite error: Invalid <ref> tag; name "RSIF20141289C31" defined multiple times with different content