QSAR and Molecular Docking of Phthalazine Derivatives as Epidermal Growth Factor Receptor ( EGFR ) Inhibitors

Article history: Received on: 26/01/2017 Accepted on: 03/03/2017 Available online: 30/04/2017 After over half a century of chemotherapy research, cancer has remained as one of the most life-threatening diseases to treat. In the present study, a series of phthalazine derivatives as anticancer agent was examined to determine the structural requirement for epidermal growth factor receptor (EGFR) inhibition by threedimensional quantitative structural activity relationship (3D-QSAR) using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) methods. Evaluation of 20 compounds (training set) served to establish model, which was validated by evaluation of a set of 08 compounds (test set). The lowest energy conformer of the most active molecule obtained from the systematic search was used as the template structure for alignment of data set. The optimum partial least square analysis (PLS) for CoMFA and CoMSIA models exhibited good ‘leave-one-out’ cross-validated coefficient (q 2 ) of 0.736 and 0.806, the coefficient of determination (r 2 ) of 0.964 and 0.976 and good predictive power of (r 2 pred) of 0.826 and 0.792 respectively. Docking was carried out to identify mode of interaction with EGFR. The final model of QSAR along with information assembled from contour maps and docking study may be used for design ing novel phthalazine derivatives as potent anticancer agents.


INTRODUCTION
Cancer is a major worldwide health problem.Although there has been a progress in the treatment and prevention of cancer, this disease remains the second major cause of death in the world (Amin et al., 2016).Cancer study suggests that tyrosine kinase receptor play important role in regulating cancer (Atlay et al., 2003).Epidermal growth factor receptor is type of membrane bound tyrosine kinase receptor which play important role in treatment of cancer (Tremont-Lukats and Gilbert, 2003).EGFR plays a key role in numerous processes that affect tumour growth and progression, including proliferation, differentiation, angiogenesis, inhibition of apoptosis and invasiveness (Oliveira-Cunha et al., 2011).Over expression of a specific receptor tyrosine kinase on the cell surface increases the incidence of receptor dimerization leads to uncontrolled cell proliferation and tumour formation (Nair, 2005).Currently, large numbers of epidermal growth factor receptor inhibitors are approved including gefitinib, erlotinib, lapatinib, vandetanib etc.
A series of phthalazine derivatives as epidermal growth factor receptor has been reported (Amin et al., 2016).In order to derive correlation between the structure and inhibitory activity of these inhibitors, we performed a three-dimensional quantitative structure activity relationship (3D-QSAR) study using comparative molecular field analysis, (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) (Verma et al., 2010).
CoMFA is used to determine the relationship of steric and electrostatic field with biological activity of compounds.While CoMSIA method was introduced by Klebe et al (1994), which considers hydrogen bond donor, hydrogen bond acceptor and hydrophobic descriptors, in addition to steric and electrostatic features (Klebe et al., 1994).
In this paper, 3D-QSAR studies using CoMFA and CoMSIA methods are applied to generate quantitative models and to specify the region where modification can be carried out to improve the inhibitory activity of compounds.The predictive ability of generated model was validated by external validation method.Further, molecular docking was carried out to identify binding mode of interaction with active site of EGFR.

Datasets
A set of 28 phthalazine derivatives reported as epidermal growth factor receptor were taken from literature for this study (Amin et al., 2016).Using the 'create set and random method' option in QSAR project of SYBYL-X 2.0, the compounds were divided arbitrarily into a training set of compounds (70%) and a test set of compounds (30%) (Juvale et al., 2006).Training set and test set were used to generate 3D-QSAR models and validation of generated models.The activity of compounds were assessed with IC values i.e.IC 50 (nM) which was converted into pIc50 (-logIC50) (Modi and Kulkarni, 2016).Using Partial Least Square (PLS) regression analysis the logarithmic affiliation helped to obtain symmetrically distributed data (Kharkar et al., 2002).The Structure of phthalazine derivatives and their inhibitory activity data are shown in Table 1.

Alignment and Molecular modelling
In the study of 3D-QSAR, alignment is one of the most important steps.There are various alignment techniques in which molecules are aligned with comparable orientation and space conformation.SYBYL-X 1.3 (Tripos Associates Inc, St Louis, Mo, USA) was utilized to perform all the molecular modelling study.Sketch function was used to design the 3D structure and subsequently Gasteiger-Huckel charges applied to all compounds.Energy minimization was carried out using the Standard Tripos molecular mechanism force field.Here, the distil alignment function was performed.The compound 8 having the highest activity was selected as template for alignment in the data set.Therefore, all the conformers were superimposed on each other and the common core structure formed which has been represented in Figure 1.All the molecules are aligned and shown in Figure 2.

CoMFA and CoMSIA fields Generation
For each alignment, the steric and electrostatic potential fields for CoMFA were calculated at each lattice intersection of a regularly spaced grid of 2.0 Å in all X, Y and Z directions.The van der Waals potential and columbic term, which represent respectively, electrostatic and steric fields, were calculated by use of Tripos force field.A sp 3 carbon atom with van der Waals radius of 1.52 Å and +1.0 charges was served as the probe atom to calculate steric and electrostatic fields.The steric and electrostatic contributions were truncated to default ±30 kcal/mol, and the electrostatic contribution was ignored at lattice intersections with maximum steric interaction.
CoMSIA is an extension of CoMFA on the same assumption that changes in binding affinities of ligands are related to changes in molecular properties represented by the field.Besides, steric and electrostatic, hydrogen bond donor, hydrogen bond acceptor and hydrophobic descriptors are calculated in CoMSIA.A Gaussian function was introduced to determine the distance between probe atom and molecular atom at all grid point similarity indices at the molecular surface can be calculated in CoMSIA (Zheng et al., 2011).The equation for CoMSIA is as follow:

…1
Where, A is the similarity index at grid point q, summed over all atoms i of the molecule j under investigation.W probe , k is the probe atom with radius 1 Å, charge +1, hydrophobicity +1, hydrogen bond donating +1 and hydrogen bond accepting +1.W ik is the actual value of the physicochemical property k of atom i. r iq is the mutual distance between the probe atom at grid point q and atom i of the test molecule.α is the attenuation factor whose optimal value is normally between 0.2 and 0.4, with a default value of 0.3 (Böhm et al., 1999).

Partial least square analysis and model validation
For development of 3D-QSAR, CoMFA and CoMSIA studies were carried out using partial least square (PLS) approach which is an extension of multiple regression analysis (Buolamwini and Assefa, 2002).All the data set of definite molecules was further treated by using PLS analysis technique and development of 3D contour maps with an optimum number of components 5 equally.PLS algorithm was used to develop the correlation between the structural property and pharmacological activity.By use of PLS, leave one out (LOO) and cross-validation analysis was performed.In cross-validation method one molecule is subtracted from the data set and its activity is predicted referencing the model obtain from rest of the data set.The cross-validation coefficient is represented as q 2 .The models were accepted if model provides value of q 2 > 0.5 and r 2 > 0.641 (Golbraikh and Tropsha, 2002).It is generally estimated as: …2 While, the validation of conventional correlation co-efficient r 2 , standard error of estimate (SEE) and F values were carried out in non-cross validation method.At the end bootstrap analysis was performed to check the robustness of the generated model, It is a method which is carried out numerous times (for good statistical information 100 times required) in which n random selections are carried out from the original set of n object, During every run, certain molecules can be omitted from the PLS analysis, while remaining molecule must be involved many times.Bootstrap r 2 (r 2 bs ) represented mean correlation coefficient.For CoMFA and CoMSIA analysis cross-validation (r 2 cv ) was carried out by two groups 'leave half out' method (Bhansali and Kulkarni, 2014).

Predictive correlation coefficient (r 2
pred ) The test set of eight compounds was used to determine the predictive power of generated 3D-QSAR model.Template structure was used to align the compounds and their pIC 50 values were predicted.Based on the test set compounds, the predictive correlation coefficient (r 2 pred ) was determined by the following equation: …3 where, SD is the totality of squared deviation between biological activity of the test set compounds and mean activities of the training set compounds, and PRESS is the totality of squared deviations between experimental and predicted activity values for each compound in the test set (Raoet.al, 2014).

Docking
Glide 5.8 (Schrodinger, LLC, New York) accessed on Windows 7 was utilized for preparation of protein and docking study (Friesner et al., 2004).The deposited crystal structure of epidermal growth factor receptor complex with erlotinib was retrieved from Protein Data Bank (PDB: 1M17).After introducing, the protein was treated under several structure requirements such as bond assignment and bond order, hydrogen addition, chains filling, bond adjustment and addition of charges to metal and correction of mislabeled components.To augment the absent residues in the side chain Maestro was utilized.In structure of protein water molecules were deleted and subsequently, hydrogen added (Halgren et al., 2004).In the structure where steric clashes present, protein structure minimization was performed by Impact Refinement module, incorporating the OPLS 2005 (Optimized Potential for Liquid Simulation) force field.While, minimization was finished when RMSD (Root Mean Square Deviation) reaches to a cut-off of 0.30 A Receptor grid was generated and to recognize the active site, the ligand was selected to define the position and size of active site.The interaction site is represented by rectangular box enclosing the translations of the mass centre of the ligand.Orientations and conformations of the ligand in the binding site were done using Glide (Grid-based Ligand Docking with Energetics).
Maestro 9.3 and LigPrep 2.5 were accessed for building ligand and preparation respectively.Before docking study, conversion of 2D structure into 3D, generation of stereoisomer, neutralization of charged structures and addition of hydrogen were carried out with help of ConfGen by OPLS-2005 force field (Chang et al., 1989).Monte-Carlo Multiple Minimum (MCMM)/Low Mode (LMOD) with per structure maximum 1000 conformers and 10000 minimization steps were utilized for exploration of Conformational space (Kolossváry and Guida, 1996).

CoMFA studies
The training set and test set was utilized to develop CoMFA model.For model, partial least square method was carried out with Leave One leave out which demonstrated the value of q 2 = 0.736 through optimum 5 components.Column filtering 2.0 and same five components was utilized for non cross-validated (r 2 ncv ) PLS analysis, which gives r 2 ncv = 0.964, significance value F = 116.231,standard error of estimation (SEE) = 0.116 and predictive power r 2 pred of 0.826.A contribution of Steric and electrostatic fields were found to be 2.902 and 1.664, respectively.Results obtained by CoMFA analysis is represented in

CoMSIA studies
Same training set and test set was utilized for CoMSIA model development because, significant results were found with CoMFA.Steric, electrostatic, hydrophobic, hydrogen bond acceptor and hydrogen bond donor fields were used for generation of CoMSIA model with various combinations of these molecular descriptors as shown in Table 3.The statistical quality of hybrid models was examined by studying the corresponding q 2 values.The model generated using descriptors steric, electrostatic; hydrogen bond acceptor and hydrophobic field were found to be best CoMSIA model.So this model was further utilized for analysis.The cross-validation (q 2 ) value for corresponding CoMSIA model was obtained 0.806 by five optimum numbers of components (ONC).
Column filtering 2.0 and similar five components was utilized for non-cross-validated (r 2 ncv ) PLS analysis, resulting in r 2 ncv = 0.976 and SEE = 0.094.The steric contribution = 0.657, electrostatic contribution = 0.832, hydrophobic contribution = 0.936, hydrogen bond acceptor = 1.241, predictive power of CoMSIA r 2 pred was found to be 0.792.Leave half out cross-  validation method and boot strapping analysis was performed to determine the quality of developed model.For the CoMSIA, r 2 cv was found to be 0.761.
To analyse the internal reliability within the dataset the mean r 2 value of bootstrapping analysis (bootstrapped r 2 bs ) and SEE bs were performed which was found to be 0.987 and 0.068 respectively.Statistical parameter obtained by CoMSIA model is represented in Table .3.

CoMFA
The significant feature of CoMFA model is the outcome obtained by 3D coefficient contour maps which are calculated as the variation in the molecular fields multiplied by the 3D-QSAR coefficient by using Model stDev*Coeff.CoMFA contour maps were generated to identify the important regions in 3D space surrounding the molecules, so that modification can be carried out in those areas to increase the inhibitory activity, which may be utilized to improve EGFR inhibitory activity.The most active compound 8 and least active compound 1were used to generate contour maps by managing style of contour to transparent for better analysis of contour surrounding compound 8 which represented in Figures 4 (a, b) and for compound 1 represented in Figure 4 (c, d) includes steric and electrostatic region respectively.
The steric region represents two colours in contour maps i.e. green and yellow.In which green color indicates the favourable part, keeping the bulkier group which leads to an increase in the biological activity whereas; yellow color indicates a decrease in the biological activity due to the bulkier region.Further, the electrostatic contour map shows red and blue color.The red color and blue color indicates the favourable and unfavourable region respectively.Here, red color region indicate that biological activity enhanced by negative charge while, blue color region indicates positive charge to increase in biological activity.

CoMSIA
CoMSIA contour maps were generated similarly as contour maps generated by CoMFA.For the CoMSIA total 5 contour maps were generated; for steric, electrostatic, hydrophobic, donor and acceptor fields.The steric and electrostatic region has the same description like CoMFA.Whereas; the hydrophobic has yellow and white color codes in which yellow color indicate hydrophobic group favorable; while white color indicates hydrophobic group unfavorable.The donor has cyan and purple color; cyan color indicates donor group favorable; while purple indicates acceptor group favorable.They show which part/substitute can help to find out the favorable and unfavourable region.

CoMFA
Figure 4 (a, c) depicts the CoMFA steric contour plot.Whereas, Figure 3 represents most active compound 8 is divided into two major regions (A and B).A large cloud of green contour in A region at -NH group indicate that introduction of bulky group is favored for activity, while at para position of phenyl group bulky substituent does not favored.This is evident from experimental activity value for compound 1-5 in which bulky group such as SO 2 NH 2 , SO 2 NHR etc. lead to least EGFR inhibitory activity.Figure 4 (b, d) displays the electrostatic contour map using CoMFA.The electrostatic fields are represented by blue and red contour map as seen in figure 4 (B) a blue contour near A region at Phthalazine ring and at B region -NH group indicate that need of positively charged substituent for electrostatic interaction with receptor to show potent inhibitory activity.While, red contour found in B region at phenyl ring suggest that introduction of negatively charged substituent at phenyl ring favored for activity.

CoMSIA
Contour map for steric and electrostatic fields in CoMSIA models are almost the same as those in the CoMFA model.Few more contours are seen for the steric and electrostatic fields which are elaborated here.The green contour seen near -NH group indicate that introduction of bulky group at this region is favored for activity.In figure 5B at B region in phenyl ring red contour suggest that introduction of negatively charged substituent leads to increase in inhibitory activity.
The yellow region in CoMSIA hydrophobic contour plot indicates that hydrophobic substituent in this region enhance inhibitory activity while white contour map indicates that hydrophilic substituent will improve activity.In Figure 5C in B region at phenyl ring yellow contour surrounded by white contour indicate that introduction of small hydrophobic substituent increase activity.In CoMSIA acceptor map a large cloud of magenta contour at Phthalazine ring indicate that introduction of hydrogen bond acceptor enhance inhibitory activity.

Docking
Molecular docking was carried out to investigate mode of interaction between compounds to the EGFR receptor, to obtain selective information for the further structure optimization.Docking studies were carried out using Glide Extra precision (XP) docking module.The docking result of compound 8 with epidermal growth factor receptor is shown in figure 7.
The glide docking score for compound 8 was found to be -5.466.From the docking study it is observed that Phthalazine ring interact with LYS 721 by ∏-cation interaction, while -CN functional group of phenyl ring with H bond.While another active molecule compound 11 in which -NH attached to phthalazine ring shows interaction with ASP 831.

CONCLUSION
In the present study, CoMFA and CoMSIA are performed using set of epidermal growth factor receptor inhibitors.Partial Least Square (PLS) analysis was performed in order to correlate the CoMFA and CoMSIA descriptors with the observed experimental inhibitory activity.A significant 3D-QSAR model was generated.This model was further validated by various statistical parameters and all were found to be significant with excellent predictability.The model obtained from CoMFA and CoMSIA have the values of q 2 = 0.736, r 2 ncv = 0.964, ONC = 5; q 2 = 0.806, r 2 ncv = 0.976, ONC = 5 respectively.The predictive power of the model was validated by using test set of eight compounds and was found to be the values of r 2 pred as 0.826and 0.792of CoMFA and CoMSIA respectively.To check the robustness and statistical confidence of the derived models, the boot-strapping analysis was performed.From the contour map study from each model it was observed that introduction of bulky group at -NH group favored for activity, while introduction of electropositive group at Phthalazine ring leads to increase in activity.Further, docking results reveals that amine attached to Phthalazine ring play important role in interaction with receptor.Literature study as well as our current finding strongly support that those inhibitors are able to interact with ASP and/or GLU amino acids may shows potent inhibitory activity against EGFR receptor and become potent anticancer agents.
Hence the CoMFA and CoMSIA models along with docking study may be used further to design novel Phthalazine derivatives as the potent epidermal growth factor receptor inhibitor for treatment of cancer.

Fig. 1 :
Fig. 1: Fragment used as a common structure for aligning database for generation of CoMFA and CoMSIA models.

Fig. 4 :
Fig. 4: Steric contour maps (a, c) and electrostatic contour maps (b, d) generated by comparative molecular field analysis (CoMFA) for the most active compound 8 (a, b) and the least active compound 1 (c, d), respectively.

Fig. 7 :
Fig. 7: 2D view of binding interaction of the most active compound 8 with the active site of receptor.

Table 2 .
The crossvalidation and bootstrapping result strongly support reliability of the CoMFA model.The experimental and predicted pIC 50 values for the training set and test set are shown in Table 4 and 5 respectively.

Table 3 :
Statistical parameter by using PLS analysis for CoMSIA.

Table 4 :
Experimental, predicted pIC50 and residual values of training set by CoMFA and CoMSIA analysis.

Table 5 :
Experimental, predicted pIC50 values and residual values of test set compound by CoMFA and CoMSIA analysis.