Biomaterial and drug design is regarded as a very resource (physical, economical and time) intensive operation 1 ; this process can be constructed into sequential stages (discovery, preclinical, clinical and pharmacovigilance) named Phase0 to Phase4. During Phase0, traditional bench experiments are carried out to identify optimal candidates that are screened through further developmental stages; while further clinical trials progressively assess toxicity, efficacy and long term safety (Phase1to Phase4) 2 . The overall development can take from a minimum of 5 up to 15 years with an estimated total development cost per approved drug of $2,168 million in 2018 3. However, the actual costs are generally a commercial confidential information and, therefore, such estimates may not fully capture the complete investments required 4 . The try-and-error approach to molecule development, particularly during the initial design and make phases of the design-make-test-analyse (DMTA) discovery cycle, is often is directed by human intuition, which is inherently biased and limited in knowledge, thus slowing drug development 5 . In such contest, the ability of data-driven in-silico prediction tools to model outcomes without the need to physically prepare candidates and run experiments would enable a fast throughput screening of candidate molecules and thus reducing both the time and monetary investments required to identify lead candidates 6–9 . This can be achieved by establishing correlations between certain properties of the molecules (inputs, also known as descriptors) and outcomes of interest using experimentally generated data on a subset of relevant compounds; the established model would then be used to predict outcomes on the wider molecule search space 10 .
Machine learning (ML) based regression techniques are becoming wide spread in many areas of data analysis in the chemical 11,12 and pharmaceutical sector 13–16 and have recently been employed in drug development 17–19, diagnostic 20, treatment algorithm optimisation 21, drug repurposing 2,22 and material discovery 23,24; however such applications are still quite limited despite been very promising 25,26. Figure 1 depicts how ML could be deployed to accelerate the biomaterial development process. In spite of the flexibility of ML techniques, material design and optimisation involving numerous parameters are situations more likely to benefit from the development of machine learning predictive models.
Osteoarthritis (OA) is a thinning or loss of the cartilage layer covering the surfaces of joints reducing articular mobility also causing pain and inflammation. Although OA is not a life threating disease, it has a great impact on the quality of life and ability to perform regular activities of patients affected resulting in great burden to society and health care providers. Worldwide, 303.1 million of people live with hip or knee osteoarthritis 27; furthermore, OA prevalence is expected to grow in consequence of the ageing population and overnutrition (two critical risk factors for OA). An effective treatment is still missing and current therapies (anti-inflammatory and analgesics) are only managing symptoms. This lack of therapeutic options is compounded by the inability of delivering the active molecules where is needed because of the obstacles posed by the low vascularisation and high electrostatic repulsion of cartilage tissues that limits the amount of drug effectively available to the targeted cells 28. In order to achieve drug localisation, without a delivery system, high concentrations of drugs are used in the synovial fluid as mass transfer is governed by concentration differences (Fick’s law) 29–31. Such approach has some problematic drawbacks; firstly, it is a wasteful use of the drug as only a minimal amount is actually therapeutic, with consequences on treatment acquisition costs. Secondly, drug washout lead to systemic exposure with possible side effects, as in case of steroids 32.
Different drug delivery systems have been developed for the localisation of drugs in cartilage in the attempt to overcome such barriers; Poly-beta-amino-ester (PBAEs) 33,34 and avidin 29 are two examples of these delivery systems. While no particular optimisation of the delivery system based on avidin performance is feasible as this a well-defined protein; there are, instead, essentially ∞ 2 possible PBAEs as these are copolymers of an amine and a di-acrylate 35. Moreover, when PBAEs end-capping is also considered, the possible combinations rise to ∞ 3. In light of the performance of PBAE as cartilage drug delivery system being extremely dependent on the polymer backbone; ML algorithms predicting the efficacy of the drug delivery in cartilage by the delivery system from the polymer’s constituents’ properties would provide a high throughput screening for the optimisation of the PBAE driven cartilage drug localisation technology reducing the cost and time to select the most promising candidate. We have previously demonstrated how the uptake of dexamethasone (DEX) (a drug routinely administered in clinics through intra-articular injections to reduce OA symptoms) in cartilage tissue, through a poly-beta-amino-ester drug delivery system, could be modelled using partial least square regression 33,34. The inputs of this model are the physical properties of the polymers and co-polymeric units (di-acrylate and amine) along with some experimentally obtained parameters such as the diffusion coefficient of the polymer through cartilage, the drug loading in the delivery system and the molecular weights (Mw and Mn) of the polymer chain 34. Despite the ability of predicting uptake, this model, in order to make predictions on new candidates, still required inputs generated by experiments (such as Mw, Mn and diffusion coefficient) thus not fully able to completely substitute lab-based work. Through this previous work, we identified a polymer (current lead candidate obtained from screening the combination of 3 acrylates and 15 amines) that increased DEX uptake in cartilage about 8 times compared to the clinical formulation 34. With the purpose of accelerating the optimisation of the PBAE structure for the cartilage delivery system through a systematic screening of large library of both acrylates and amines, we hypothesised that machine learning algorithms utilising only predictors available in public libraries or calculated from the compound structure, namely the physico-chemical properties of the PBAE components, could be employed to fully predict the performance of the delivery system without the need for any experimentally originated data. Drug utilised uptake data experimentally obtained from a subset of a large polymer library were to train and optimise 25 machine learning models (e.g. Random Forests, kNN, SVM, neural network and MARS) and investigated their predictive performance to identify the most accurate algorithm. This model was then employed to screen the PBAEs research space and key features in the amine and acrylate structure recognised, further elucidating correlations between PBAE structural properties and drug uptake. A further round of ML predictions was conducted on variations of the previously selected core structures to refine and improve efficacy. The most promising candidate identified had 3 folds expected efficacy improvement over the previous best performing candidate.