Over 10 years experience of Traceability Solutions
By Pharmatrax Author
Category: Technoloy
No CommentsThe use of artificial intelligence (AI) has been increasing in various sectors of society, particularly the pharmaceutical industry. In this review, we highlight the use of AI in diverse sectors of the pharmaceutical industry, including drug discovery and development, drug repurposing, improving pharmaceutical productivity, and clinical trials, among others; such use reduces the human workload as well as achieving targets in a short period of time. We also discuss crosstalk between the tools and techniques utilized in AI, ongoing challenges, and ways to overcome them, along with the future of AI in the pharmaceutical industry.
Over the past few years, there has been a drastic increase in data digitalization in the pharmaceutical sector. However, this digitalization comes with the challenge of acquiring, scrutinizing, and applying that knowledge to solve complex clinical problems [1]. This motivates the use of AI, because it can handle large volumes of data with enhanced automation [2]. AI is a technology-based system involving various advanced tools and networks that can mimic human intelligence. At the same time, it does not threaten to replace human physical presence 3, 4 completely. AI utilizes systems and software that can interpret and learn from the input data to make independent decisions for accomplishing specific objectives. Its applications are continuously being extended in the pharmaceutical field, as described in this review. According to the McKinsey Global Institute, the rapid advances in AI-guided automation will be likely to completely change the work culture of society 5, 6.
AI involves several method domains, such as reasoning, knowledge representation, solution search, and, among them, a fundamental paradigm of machine learning (ML). ML uses algorithms that can recognize patterns within a set of data that has been further classified. A subfield of the ML is deep learning (DL), which engages artificial neural networks (ANNs). These comprise a set of interconnected sophisticated computing elements involving ‘perceptions’ analogous to human biological neurons, mimicking the transmission of electrical impulses in the human brain [7]. ANNs constitute a set of nodes, each receiving a separate input, ultimately converting them to output, either singly or multi-linked using algorithms to solve problems [8]. ANNs involve various types, including multilayer perceptron (MLP) networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs), which utilize either supervised or unsupervised training procedures 9, 10.
The MLP network has applications including pattern recognition, optimization aids, process identification, and controls, are usually trained by supervised training procedures operating in a single direction only, and can be used as universal pattern classifiers [11]. RNNs are networks with a closed-loop, having the capability to memorize and store information, such as Boltzmann constants and Hopfield networks 11, 12. CNNs are a series of dynamic systems with local connections, characterized by its topology, and have use in image and video processing, biological system modeling, processing complex brain functions, pattern recognition, and sophisticated signal processing [13]. The more complex forms include Kohonen networks, RBF networks, LVQ networks, counter-propagation networks, and ADALINE networks , 11. Examples of method domains of AI are summarized in Figure 1 .
Several tools have been developed based on the networks that form the core architecture of AI systems. One such tool developed using AI technology is the International Business Machine (IBM) Watson supercomputer (IBM, New York, USA). It was designed to assist in the analysis of a patient’s medical information and its correlation with a vast database, resulting in suggesting treatment strategies for cancer. This system can also be used for the rapid detection of diseases. This was demonstrated by its ability to detect breast cancer in only 60
Involvement of AI in the development of a pharmaceutical product from the bench to the bedside can be imagined given that it can aid rational drug design assist in decision making; determine the right therapy for a patient, including personalized medicines; and manage the clinical data generated and use it for future drug development . E-VAI is an analytical and decision-making AI platform developed by Eularis, which uses ML algorithms along with an easy-to-use user interface to create analytical roadmaps based on competitors, key stakeholders, and currently held market share to predict key drivers in sales of pharmaceuticals , thus helping marketing executives to allocate resources for maximum market share gain, reversing poor sales and enabled them to anticipate where to make investments. Different applications of AI in drug discovery and development are summarized in
The vast chemical space, comprising >1060 molecules, fosters the development of a large number of drug molecules. However, the lack of advanced technologies limits the drug development process, making it a time-consuming and expensive task, which can be addressed by using AI. AI can recognize hit and lead compounds, and provide a quicker validation of the drug target and optimization of the drug structure design Different applications of AI in drug discovery are depicted in Figure 3 .
Despite its advantages, AI faces some significant data challenges, such as the scale, growth, diversity, and uncertainty of the data. The data sets available for drug development in pharmaceutical companies can involve millions of compounds, and traditional ML tools might not be able to deal with these types of data. Quantitative structure-activity relationship (QSAR)-based computational model can quickly predict large numbers of compounds or simple physicochemical parameters, such as log P or log D. However, these models are some way from the predictions of complex biological properties, such as the efficacy and adverse effects of compounds. In addition, QSAR-based models also face problems such as small training sets, experimental data error in training sets, and lack of experimental validations. To overcome these challenges, recently developed AI approaches, such as DL and relevant modeling studies, can be implemented for safety and efficacy evaluations of drug molecules based on big data modeling and analysis. In 2012, Merck supported a QSAR ML challenge to observe the advantages of DL in the drug discovery process in the pharmaceutical industry. DL models showed significant predictivity compared with traditional ML approaches for 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) data sets of drug candidates 21, 22.
The virtual chemical space is enormous and suggests a geographical map of molecules by illustrating the distributions of molecules and their properties. The idea behind the illustration of chemical space is to collect positional information about molecules within the space to search for bioactive compounds and, thus, virtual screening (VS) helps to select appropriate molecules for further testing. Several chemical spaces are open access, including PubChem, ChemBank, DrugBank, and ChemDB.
Numerous in silico methods to virtual screen compounds from virtual chemical spaces along with structure and ligand-based approaches, provide a better profile analysis, faster elimination of nonlead compounds and selection of drug molecules, with reduced expenditure [19]. Drug design algorithms, such as coulomb matrices and molecular fingerprint recognition, consider the physical, chemical, and toxicological profiles to select a lead compound [23].
Various parameters, such as predictive models, the similarity of molecules, the molecule generation process, and the application of in silico approaches can be used to predict the desired chemical structure of a compound 20, 24. Pereira et al. presented a new system, DeepVS, for the docking of 40 receptors and 2950 ligands, which showed exceptional performance when 95 000 decoys were tested against these receptors [25]. Another approach applied a multiobjective automated replacement algorithm to optimize the potency profile of a cyclin-dependent kinase-2 inhibitor by assessing its shape similarity, biochemical activity, and physicochemical properties [26].
QSAR modeling tools have been utilized for the identification of potential drug candidates and have evolved into AI-based QSAR approaches, such as linear discriminant analysis (LDA), support vector machines (SVMs), random forest (RF) and decision trees, which can be applied to speed up QSAR analysis 27, 28, 29. King et al. found a negligible statistical difference when the ability of six AI algorithms to rank anonymous compounds in terms of biological activity was compared with that of traditional approaches [30].
The process of discovering and developing a drug can take over a decade and costs US$2.8 billion on average. Even then, nine out of ten therapeutic molecules fail Phase II clinical trials and regulatory approval 31, 32. Algorithms, such as Nearest-Neighbour classifiers, RF, extreme learning machines, SVMs, and deep neural networks (DNNs), are used for VS based on synthesis feasibility and can also predict in vivo activity and toxicity 31, 33. Several biopharmaceutical companies, such as Bayer, Roche, and Pfizer, have teamed up with IT companies to develop a platform for the discovery of therapies in areas such as immuno-oncology and cardiovascular diseases [19]. The aspects of VS to which AI has been applied are discussed below.
Physicochemical properties, such as solubility, partition coefficient (logP), degree of ionization, and intrinsic permeability of the drug, indirectly affect its pharmacokinetics properties and its target receptor family and, hence, must be considered when designing a new drug [34]. Different AI-based tools can be used to predict physicochemical properties. For example, ML uses large data sets produced during compound optimization done previously to train the program [35]. Algorithms for drug design include molecular descriptors, such as SMILES strings, potential energy measurements, electron density around the molecule, and coordinates of atoms in 3D, to generate feasible molecules via DNN and thereby predict its properties [36].
Zang et al. created a quantitative structure–property relationship (QSPR) workflow to determine the six physicochemical properties of environmental chemicals obtained from the Environmental Protection Agency (EPA) called the Estimation Program Interface (EPI) Suite [35]. Neural networks based on the ADMET predictor and ALGOPS program have been used to predict the lipophilicity and solubility of various compounds [37]. DL methods, such as undirected graph recursive neural networks and graph-based convolutional neural networks (CVNN), have been used to predict the solubility of molecules [38].
In several instances, ANN-based models, graph kernels, and kernel ridge-based models were developed to predict the acid dissociation constant of compounds 35, 39. Similarly, cell lines, such as Madin-Darby canine kidney cells and human colon adenocarcinoma (Caco-2) cells have been utilized to generate cellular permeability data of a diverse class of molecules, which are subsequently fed to AI-assisted predictors [34].
Kumar et al. developed six predictive models [SVMs, ANNs, k-nearest neighbor algorithms, LDAs, probabilistic neural network algorithms, and partial least square (PLS)] utilizing 745 compounds for training; these were used later on 497 compounds to predict their intestinal absorptivity based on parameters including molecular surface area, molecular mass, total hydrogen count, molecular refractivity, molecular volume, logP, total polar surface area, the sum of E- states indices, solubility index (log S), and rotatable bonds [40]. On similar lines, RF and DNN-based in silico models were developed to determine human intestinal absorption of a variety of chemical compounds [41]. Thus, AI has a significant role in the development of a drug, to predict not only its desired physicochemical properties, but also the desired bioactivity.
The efficacy of drug molecules depends on their affinity for the target protein or receptor. Drug molecules that do not show any interaction or affinity towards the targeted protein will not be able to deliver the therapeutic response. In some instances, it might also be possible that developed drug molecules interact with unintended proteins or receptors, leading to toxicity. Hence, drug target binding affinity (DTBA) is vital to predict drug–target interactions. AI-based methods can measure the binding affinity of a drug by considering either the features or similarities of the drug and its target. Feature-based interactions recognize the chemical moieties of the drug and that of the target to determine the feature vectors. By contrast, in similarity-based interaction, the similarity between drug and target is considered, and it is assumed that similar drugs will interact with the same targets [42].
Web applications, such as ChemMapper and the similarity ensemble approach (SEA), are available for predicting drug–target interactions [43]. Many strategies involving ML and DL have been used to determine DTBA, such as KronRLS, SimBoost, DeepDTA, and PADME. ML-based approaches, such as Kronecker-regularized least squares (KronRLS), evaluate the similarity between drugs and protein molecules to determine DTBA. Similarly, SimBoost utilized regression trees to predict DTBA, and considers both feature-based and similarity-based interactions. Drug features from SMILES, ligand maximum common substructure (LMCS), extended connectivity fingerprint, or a combination thereof can also be considered [42].
DL approaches have shown improved performance compared with ML because they apply network-based methods that do not depend on the availability of the 3D protein structure [43]. DeepDTA, PADME, WideDTA, and DeepAffinity are some DL methods used to measure DTBA. DeepDTA accepts drug data in the form of SMILES, whereby, the amino acid sequence is entered for protein input data and for the 1D representation of the drug structure [44]. WideDTA is CVNN DL method that incorporates ligand SMILES (LS), amino acid sequences, LMCS, and protein domains and motifs as input data for assessing the binding affinity [45].
DeepAffinity and Protein And Drug Molecule interaction prEdiction (PADME) are similar to the approaches described earlier [46]. DeepAffinity is an interpretable DL model that uses both RNN and CNN and both unlabeled and labeled data. It takes into account the compound in the SMILES format and protein sequences in the structural and physicochemical properties [47]. PADME is a DL-based platform that utilizes feed-forward neural networks for predicting drug target interactions (DTIs). It considers the combination of the features of the drug and target protein as input data and forecasts the interaction strength between the two. For the drug and the target, the SMILES representation and the protein sequence composition (PSC) are used for illustration, respectively [46]. Unsupervised ML techniques, such as MANTRA and PREDICT, can be used to forecast the therapeutic efficacy of drugs and target proteins of known and unknown pharmaceuticals, which can also be extrapolated to the application of drug repurposing and interpreting the molecular mechanism of the therapeutics. MANTRA groups compound based on similar gene expression profiles using a CMap data set and clusters those compounds predicted to have a common mechanism of action and common biological pathway [43]. The bioactivity of a drug also includes ADME data. AI-based tools, such as XenoSite, FAME, and SMARTCyp, are involved in determining the sites of metabolism of the drug. In addition, software such as CypRules, MetaSite, MetaPred, SMARTCyp, and WhichCyp were used to identify specific isoforms of CYP450 that mediate a particular drug metabolism. The clearance pathway of 141 approved drugs was done by SVM-based predictors with high accuracy [48].
Readmore: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/