Application of neural networks in the identification of pharmaceutical lead hits

Artificial neural networks (ANNs) are also known as a digitalized model of the mammalian brain and are used in pattern reorganization, including machine translation and speech reorganization. In addition, some attempts have been made to implement it in new drug design or discovery. Presently, the drug development process utilizes the complex approach in identifying a single lead hit which often fails multiple times due to its poor pharmacokinetic properties and severe side effects. This could be the outcome of poor screening approaches for hit candidates and neglecting the probable bias. However, ANNs can be helpful in decision-making for many researchers, as well as in clinical application. Also, for medicinal chemists, it may act as an effective tool to pick lead hit and predict the 3D protein confirmation to evaluate the effectiveness of the selected lead hit. The present study briefs on ANNs that can be used as a predictive tool to classify diseases, in vitro and in vivo data correlation, target identification, absorption, distribution, metabolism, excretion, and toxicity profiling, and its application in modern drug discovery.


INTRODUCTION TO THE NEURAL NETWORK
The development of various machine learning algorithms, i.e., naive Bayes classifiers, logistic regression, vector machines, and artificial intelligence systems, have contributed in the revolution of a new drug design which is also supported via in silico molecular docking and simulation, protein structure validation, molecular energy prediction, and virtual screening (Yang et al., 2019a(Yang et al., , 2019b. In addition, the rapid development of neural network models due to huge data has improved multiple calibration techniques (Lopes et al., 2019;Schmidt et al., 2019). Furthermore, utilizing a large set of data series, the artificial neural network (ANN) models are adopted to correlate dependent and independent variables (Pasini, 2015).
A basic unit of the neural network comprises an input, hidden, and output layer and is similar to a human neuronal cell (Jothilakshmi and Gudivada, 2016). The input layer connects the hidden layer and transmits the signal, termed as X 1, X 2 , X 3 , X 4 ..... X n , with their respective weights W 1 , W 2 , W 3 , W 4 .....W n . The weight of each signal represents an input signal as they depend on the function and magnitude of signals. The summation of input signals with their respective weights, i.e., X 1 W 1 + X 2 W 2 + X 3 W 3 + X 4 W 4 +..... X n W n , is processed within the hidden layer and the output is obtained. The concept of the basic node of the neural network can be represented mathematically as y = f (I) = f (b + ∑ n i xi. wi), where b = bias, x = input to the neuron, w = weights, n = the number of inputs from the incoming layer, and i = counter from 0 to n and is similar to the human neural system (Figure 1).

Weights
A weight represents a connecting link between neurons with a numerical value (Dudek, 2017;Georgevici and Terblanche, 2019) in which the value is directly proportional to the weight. If the value of weight is higher, it may possess an important input signal that can be visualized in the matrix format, as shown in Figure 1.

Bias
A bias is a number that helps to understand the situation. This can be explained as if one tries to make a decision, he/she needs to focus on all the possible/observable factors (Dudek, 2017). However, there could be multiple parameters or variables that may not be noticed. These unnoticed parameters or variables are tried to be incorporated in the neural network in terms of bias.

Activation
A neuron decides on its output and also makes a tiny decision for its output which is called activation. It can be represented as f (z), where z is the cluster of all inputs and can be categorized as binary, linear, and nonlinear (Dudek, 2017;Georgevici and Terblanche, 2019). Suppose, if an input value is above or below a basal threshold (also considered as a resting state of the neuron), the neuron gets activated and sends a signal to the next layer which is termed as binary activation. This function is limited with the multi-value output; categorization of multiple inputs is not possible with this activation. Therefore, it can be explained as a qualitative analysis as it provides the results as "yes" or "no" (Feng and Lu, 2019).
In linear activation function, inputs are multiplied with respective weights to produce an output signal and are proportional to the input. Unlike binary activation, it provides multiple outputs. Mathematically, it can be explained as if f (z) = z, then f (z) is called linear activation, meaning nothing happens. However, it is limited with two major problems, i.e., it cannot be used in backpropagation to train the model and all the layers are collapsed into one (Feng and Lu, 2019).
Similarly, nonlinear neural activation produces the complex plot between the inputs and outputs to model the complex data. Nonlinear activation addresses the problems associated with linear activation via the backpropagation and creates the deep neural network. Some of the activation functions widely used to produce the complex plot in the ANN include sigmoid, TanH, ReLU, Leaky ReLU, parametric ReLU, softmax, and swish (Feng and Lu, 2019).

ANN AS A PREDICTIVE TOOL
ANN computes the output values from input by assigning the signals to binary as −1 or 1 which is compared with 0 and functions like the human brain neuron system. "Feedforward and backpropagation" are learning characteristics of an ANN in which it assesses its past errors and experiences via a parallel processing manner (Agatonovic-Kustrin and Beresford, 2000). Feedforward explains processing and recall patterns, whereas backpropagation explains the training of the neural network which is dependent on the sample input and predictable outputs. Furthermore, the predicted outputs are compared with the actual output for a given input. This tool can be used to analyze large biological data and can be designed via learning and processing characteristics. Since it is the outcome of collective behavior in the network due to the nodes' interconnections, the network itself evaluates its output and again makes the secondary decision which is based on the input signal and calculated weighted sum, and compares it to the threshold. The calculations in ANNs are under the control of hidden layers and the algorithms set in it are based on the target or non-target types to decide the neural pattern of learning. If we assess the processing character of this system, the input of each signal is multiplied by its corresponding weight and is summed together before activation . Now, when we compare it to the human neuron, the dendrites receive information, process it at the soma, and the axons deliver the output. Hence, the ANN is an algorithm-based neural system where the information is received, processed, and delivered, which is similar to a human neuron. Therefore, it can be used to handle a large set of biological data, generate an outcome, and predict probable results.
The ANN with evolutionary algorithms has been implemented in the medical field for multiple pathogeneses, including Alzheimer's disease, cardiovascular disease, gastroenterology amyotrophic lateral sclerosis (Grossi, 2006;Street et al., 2008). Although the implementation of ANN is under the new trend in medical science, the methods are not widely used; however, it has a clinical impact in specific areas including the early detection of acute myocardial infractions, X-ray mammography, and cervical cytology. The extensive implementation of ANN in this field has been detailed by Lisboa (2002).

ANN AS A DISEASE CLASSIFIER
An ANN can handle the large data set to identify and characterize the subject or matter (Ahmadi et al., 2013;Pasini, 2015). In this regard, actual or sample data are supplied to train and adjust network considering probable errors that are further validated with the standard scale (Michael et al., 2000) and to trigger the network to proceed, loop, and stop the processing that can be adjusted by manipulating the hidden layer if the outcome is unsatisfactory.
Let us explain this with one real-life example. Suppose a professor asks to explain type 1 and 2 diabetes mellitus. Now, the student starts considering a few keywords to differentiate these two pathogenic conditions by some keywords as "insulindeficient or insulin-resistant, any age or mostly in adults, sudden onset or gradual unset, common or rare ketoureosis, less or more prevalence, presence or absence of antibodies, and …". Now, an algorithm can be written to classify type 1 and type 2 diabetes mellitus based on the above parameters and predict through mean square error and bewilderment matrices and judge the output value with the threshold. However, diabetes mellitus is a polygenic state in which multiple genes or proteins are involved in the development and progression of the disease. It means the pathogenesis of diabetes is not fully understood which may lead to an error that must be considered as bias in the hidden layer.
The disease problem can be tremendous in complex pathogenesis; however, ANN can classify complex diseases like cancer which are hard to distinguish. Furthermore, it can support determining the inclination of an individual to the particular pathogenesis via disease risk prediction via the evaluation of the specific gene or the mutation (Azati Team, 2021). Furthermore, an attempt was made to use the ANN to diagnose the heart disease where the feedforward propagational neural network was utilized in classifying the presence or absence of the disease which consisted of 13 input neurons, 20 in the hidden layer, and 1 as the output signal. The test and error method was used to fix the neuron numbers in the hidden layer, which demonstrated the efficacy of the network to diagnose or classify 88% of the cases in the given training set (Ajam, 2015).
Basically, by incorporating the multiple parameters of the disease, the pathogenesis algorithms can be designed to predict the disease's severity and its complications. Of course, this may help to write algorithms for classifying the disease into mild, moderate, and severe that may be implemented in the rapid decision-making process.

APPLICATION FOR IN VITRO AND IN VIVO DATA CORRELATION (IVIVC)
One of the important criteria considered for the drug choice should be based on the subject's demographic data which affects the pharmacokinetic-pharmacodynamic (PK-PD) profile. The data obtained in vitro can be correlated in vivo via ANN (Rackley, 1996). The utilization of ANNs in the field of PK-PD was described previously which is based on a backpropagation learning algorithm and was used as a Bayesian classifier by using simulated data. Attempts have been made to correlate in vitroin vivo findings for controlled released dosage formulations (de Matas et al., 2007(de Matas et al., , 2008Parojčić et al., 2007).
Additionally, it is important to determine the drug's pharmacokinetics (in vivo) profile based on the in vitro dissolution with other important variables. In IVIVC, it was observed that the input-output relationship may be independent of the internal configuration of the model if the model is validated which may play an important role in product development and establishing the dissolution specifications (Caramella et al., 1993;Graffner et al., 1984;Sullivan et al., 1976). In this regard, multiple simple linear models can be implemented in describing the pharmacokinetic absorption; however, sometimes no correlation may be observed. This could be due to the unidentified or unconsidered factors; can be termed as bias meaning that the model may not account for some physiological rate-determining process; and contributed by probable built-in variables to the parameters of modeled relationship (Barr et al., 1994;Levy and Hollister, 1964;Wood et al., 1990).
First, let us have a look at the conventional method for the IVIVC. It involves (a) formulation with different in vitro drug release profiles, (b) evaluation of this formulation in vivo with an appropriate route of administration p.o. or i.v. with suitable references, (c) utilization of an appropriate procedure to estimate in vivo drug release; and (d) developing the suitable pharmacokinetic function to relate in vitro and in vivo data. Furthermore, if the process fails for IVIVC, multiple mechanistic or empirical functions are considered (Hussain, 1997).
In the ANN, a backpropagation-based network can be used for an IVIVC in which data utilized for training sets play an important role. Few approaches have been made in this case in which the in vivo efficacy of inhalers has been predicted using in vitro data via an ANN reflecting its efficacy. Furthermore, it was suggested to improve the model by considering other input factors like larger datasheets and subjects and other input variables that directly affect the inhalation (de Matas et al., 2008). An ANN can also be utilized to correlate in vitro-in vivo data for metabolic clearance and dissolution kinetic of newly identified drug molecules (Dowell et al., 1999;Elçiçek et al., 2014;Lavé et al., 1999;Schneider et al., 1999). One of the major benefits of an ANN is that it provides probable preliminary information of drug behavior without the conductance of in vivo experiments. Furthermore, the IVIVC model can be designed to predict the behavior of pharmaceutical formulations by using multiple physicochemical characteristics (Mendyk et al., 2013). This helps to understand the bioavailability of drug formulations in preliminary steps. Dowell et al. (1999) applied ANN to correlate the findings of the extended-release formulation. Herein, the authors utilized the initial training sets for 2 formulations from 1,512 pharmacokinetic time points from 9 patients which were enrolled for a cross-sectional study. Also, the authors evaluated 29 ANN configurations whose structure included feedforward, recurrent, jump connections, and general regression neural networks, with input-output association types, and the whole ANN was evaluated based on predictive performance.

PREDICTION FOR BLOOD-BRAIN BARRIER (BBB) PERMEABILITY
Approaches were made to predict the BBB permeability of compounds using ANNs by Garg and Verma (2006) and Chen et al. (2009). Molecules designed to manage multiple psychiatric illnesses should cross the BBB which is composed of astrocytes, endothelial cells, and pericytes. However, multiple in vitro and in vivo approaches are considered to assess the drug efficacy to penetrate BBB, in silico prediction could be more effective as they are less expensive and easy to handle. Multiple in silico tools and online servers like admetSAR (Yang et al., 2019a(Yang et al., , 2019b are utilized to predict the oral bioavailability and BBB permeability of molecules. Advancing to these, the ANN could be the upgraded computational and decision-making tool to obtain the preliminary data as they are pattern recognizers (Kim, 2010). Multiple input signals reflecting the molecular weight, lipophilicity, and polar surface area of a molecule can be considered to provide the functional output of the BBB permeability score. Furthermore, a few parameters of the drug-like substance like hydrogen bond donors/acceptors may be considered to avoid bias. Additional parameters like plasma protein binding, disease presence, e.g., hyper or hypotension, and age may affect the BBB permeability which can be further considered to refine the algorithm to provide a better BBB permeability score.

ROLE OF NEURAL NETWORKS IN TARGET IDENTIFICATION AND VALIDATION
The target can be up-or downregulated to manage the disease; attempts to restore the homeostatic condition; and is an important criterion to introduce new therapeutic agents (Schenone et al., 2013). Furthermore, multiple protein modeling and simulations are also considered to identify a target for a disease (Schmidt et al., 2014). During target identification, antitargets are also considered and should not be modulated as it may lead to side or adverse effects due to their involvement in the homeostatic regulation of cell or tissue or organ function (Roman et al., 2018).
One of the concepts to be considered for identifying the drug target or predicting the biological spectrum is dependent on the probable activity and inactivity of the hit agent (Lagunin et al., 2000). To understand this, let us brief the phrase "master key unlocks multiple locks." Similarly, a drug may modulate multiple proteins which could be dependent on the lipophilicity, molecular weight, and surface area or ligand volume, followed by its interaction with the active site, etc. Let us suppose the compound "X" targets two proteins "A" and "B" in which the probable activity of "X" for "A" is higher than "B." Now, the target "A" is more dominant in disease "C" and "B" in "D." Furthermore, let us assume that diseases "C" and "D" are independent. It means if the target is to be identified for the drug "X," then target "A" should be considered to manage disease "C." If "X" is utilized for disease "D" then undesirable effects may occur which can be termed as side effects for drug "X" in the management of disease "D" by targeting protein "B." During this case, one may be aware of the probable activity of drug molecules over probable inactivity. Now, the case comes to incorporate this idea in machine learning. For this purpose, counterpropagation neural networks, Bayesian neural networks, and support vector machine algorithms can be utilized for target identification and validation. However, the output of the data is dependent upon the inputs considered, level of bias, and the algorithm used; could be more convenient for output prediction with the utilization of mathematical packages, like MATLAB (https:// www.mathworks.com/products/matlab.html) and Mathematica (https://www.wolfram.com/mathematica/), and purpose design packages, like Neural Lab (https://neurallab.io/) and JavaNNS (http://www.ra.cs.uni-tuebingen.de/software/JavaNNS/manual/ JavaNNS-manual.html).

ROLE OF NEURAL NETWORK IN HIT IDENTIFICATION AND ABSORPTION, DISTRIBUTION, METABOLISM, EXCRETION, AND TOXICITY (ADMET) PROFILING
Among the series of compounds, the hit molecule can be considered with the highest pharmacological activity (depending on the probable activity and inactivity), drug-likeness score, and better ADMET profile. It means the predicted pharmacokinetic and pharmacodynamic data may play an important role to identify the lead hits against disease. Now, one problem is how to define the lead hit and what input signals should be considered? In this regard, ADME data can be considered including some toxicity parameters like herG inhibition, cardiotoxicity, hepatotoxicity, ototoxicity, neurotoxicity, nephrotoxicity, and Ames mutagenicity which can be freely predicted using open-source predictors like admetSAR (Yang et al., 2019a(Yang et al., , 2019b. Furthermore, the probable activity of a molecule against a target can be used to predict the biological spectrum against identified disease. Also, decisionmaking algorithms can be written to provide the output signal. However, as stated, it could be more complicated to identify the lead hit from a series of compounds as a single step includes the complex algorithm to process input signals. For example, to generate the input signals for drug absorption, all the factors that affect the drug absorptivity are to be considered. A simplified form of a neural network that can be considered to identify the lead hit is shown in Figure 2.

APPLICATION IN MODERN DRUG DISCOVERY
ANNs are defined as "digitalized models of the brain" as they are complex; utilize the nonlinear relationship; and the basic anatomy is similar to the human neurons (Zador, 2019). Thus, they have their importance in drug discovery and development with the proper utilization of virtual screening, quantitative structureactivity relationship (QSAR) study, mathematical modeling, pharmacophore identification, in silico molecular docking, and ADMET prediction. Virtual screening may help to predict the biological spectrum of lead molecules (Ekins et al., 2007;Tang and Marshall, 2011). Furthermore, machine learning is also successfully implemented in the discovery of modern medicine based on target identification (gene-disease association, identification of splice variants, and target druggability prediction), compound design (reaction plan, ligand-based drug design), prediction of biomarkers (tissue-specific biomarkers, drug-response signature), and determination of drug response (cellular phenotyping and microenvironment measurement) (Vamathevan et al., 2019). In this regard, Bayesian neural networks can be used to identify the biomolecules that act on the brain and cardiovascular-related pathogenesis. Binding site identification of the receptor can be predicted via the pharmacophore modeling (Huang et al., 2018) in which the active site and its geometry play an important role and can be incorporated to predict molecular surface and create a 2D feature map.
In silico molecular docking also helps to identify the suitable pose of the ligand with its target (Meng et al., 2011). The ligand binds with the given target with some energy or affinity; explained in terms of the kcal/mol for AutoDock tools and also interacts with amino acid residues, i.e., hydrogen bond interactions or pi bond interactions. After docking, different poses of the ligand molecules are obtained from which the pose scoring the lowest binding energy is chosen to identify the ligand-protein interaction. Hence, this confirmation and the binding affinity can be trained in the neural network to identify the regulators of the protein. So, based on the binding energy, the pose with minimum binding energy can be considered as the input signals to the ANN. Furthermore, the ADMET profile also plays an important role in the drug development steps to evaluate the probable pharmacokinetic profile of lead candidates (Morgan, 2011). Multiple in silico tools can be utilized to identify the probable toxicity to assess the probable activity of the biomolecules for ADMET which may be used in predicting the drug sensitivity, chemical-genetic association, assessing the structure-activity relationship via the multiple regression analysis methods using decision trees, principal components, and linear, portal least square, and Gaussian process regression. Likewise, drug-target association and tissue-specific biomarkers can be traced via the classifier methods via natural language processing kernel methods, gradient boosting, Bayesian classifier, nearest neighbor, and discriminant analysis. Additionally, single-cell information, image analysis, and biomarker assessment can help in target druggability via the clustering method through a generative adversarial network, Gaussian mixture, k-means, and hierarchical clustering (Fig. 3) Due to the ANN efficacy to task concerning the trained dataset, it can self-correct the errors, organize and store the learned information, and compute faster data integration and retrieval (Mandlik et al., 2016). Additionally, ANN can investigate the complex and nonlinear relationship and find the application in  various fields including modern drug discovery. Also, they are used in the discriminant and regression data analysis which benefits in screening the huge inhibitor libraries and ligand properties based on pharmacophore features, QSAR, docking outputs, and ADMET profile (Mandlik et al., 2016). However, machine learning needs large data with specific characters like the requirement of the standardized high-dimensional drug-target-disease dataset, comprehensive omics data, successful and unsuccessful metadata from clinical trials, training dataset, compound reaction models, gold standard ADME data, and various protein structures (Vamathevan et al., 2019).

CONCLUSION
ANN can be utilized in mapping the relationship between one variable with other variables. Furthermore, it uses the nonlinear relationship and is a powerful predictive tool in analyzing the data compared to statistical analysis. Although the identification of lead hit to manage disease pathogenesis is a complex process and understanding the disease is timeconsuming, the systematic utilization of ANN could resolve this problem and help to understand the disease and identify new drug candidates. From a future perspective, one can utilize advanced machine learning systems to progress technical advancement and also improve artificial intelligence performance in drug discovery. In addition, it is also important to handle the noisy data which can be managed via advanced deep learning. However, an ANN is also not free from a few limitations like overfitting and undertaking, error in the standard datasheet, the requirement of abundant data to construct the standard datasheet, and its diversity.

CONFLICT OF INTEREST
There are no conflicts of interest to declare.

FUNDING
This work has not received any funds from national and international agencies.

CONSENT FOR PUBLICATION
Not applicable.

DATA AVAILABILITY
Not applicable.