Non-covalent interactions
Extensive studies on several non-covalent interactions such as the cation-π interaction, hydrogen bonding and π-π/CH-π interactions have been undertaken by him. These studies help delineate the factors that are crucial for the subtle modulation of non-bonded interactions. In particular, several aspects responsible for fine tuning cation-π interactions have been established. He has demonstrated the importance of cation-π interactions in chemistry and biology. Through his studies he has proven that cation-π interactions are the strongest of non-covalent interactions and the dramatic modulation that they undergo as a function of a) the size of the system, b) solvation c) cation-π versus cation-σ and d) curvature.
Manifestation of cooperativity in chemistry and biology
Dr. Sastry has quantitatively defined the concept of cooperativity. He has effectively demonstrated how a pair of non-covalent interactions mutually influences each other. In particular, the effect of cation-π interaction on the neighboring π-π interaction and H-bonding interactions has been delineated. The concept of cooperativity has been revisited in H-bonded clusters. He has elegantly displayed the manifestation of cooperativity in different structural forms of H-bonded clusters of water, formamide and acetamide.
Computer-aided drug discovery
The significant work done by Dr. Sastry in the field of CADD can be segregated into six important categories a) structure based drug design comprising study of the structural nuances and mechanisms of a number of therapeutic targets like P-type ATPase, DNA, p38 MAP kinase, Choline kinase, Phosphodiestearse, Aromatase, glycoprotein, 5-lipoxygenase b) ligand based approaches like QSAR, informatics to reduce the dimension of chemical space without losing its diversity c) probing of unique target associated issues like subtype-selectivity and kinase specificity d) benchmarking of docking protocols and in depth analysis of structure and analogue based approaches to develop strategies and filters for the virtual screening of the millions of compounds in shortest possible time and with high precision e) database development and f) in-depth analysis of the noncovalent interactions governing drug-receptor interactions on small systems and extending them to bigger systems like clusters and bio-macromolecules in order to apply the insights gained therein for the development of a scoring functions for non-covalent interactions.
Molecular modeling
The group has benchmarked most of the CADD protocols like docking, virtual screening, molecular dynamics etc. before employing them to study a scientific problem. One such protocol which has been benchmarked and was used to provide valuable scientific insights in modeling the metal binding sites of 5-Lipoxygenase (5-LO), conformational variations of the proton pump of H+K+ ATPase along the transmembrane region and characterizing the hydronium ion binding.
With the lack of crystal structures for membrane proteins and considering the therapeutic potential, different physiological structures of ATPase have been modeled to evaluate the antiport mechanism.
Artificial Intelligence and Machine Learning
The influence of Artificial Intelligence (AI) is vast, whether it’s in science, societal issues or environmental issues. The power of analyzing data with machine learning and deep learning methods, can help decision making with greater ease. To make the knowledge and information more effectively actionable along with the scholar, entrepreneurs, and technologists; AI technology solutions, machine learning (ML), big data, computer vision, Internet of Things, which are the pillars of Industry 5.0, can help to get more tangible guidance in the direction of achieving the sustainability. Current scenario displayed the integration of AI innovation in areas like healthcare, agriculture, schooling, transport, environmental protection and many more. In order to support AI’s rapid progress and promote sustainable development, regulatory monitoring and knowledge are a must. Failure to do so might lead to lapses in ethics, safety, and transparency. While technology, applications and utility are greatly appreciated in the direction of sustainability, the development of software also can be one of the major goals for science community.
Although machine learning has exploited many areas of scientific research since its inception, but in recent years it has made immense contribution in bioinformatics, chemo-informatics and drug discovery through predictive models for physical properties, bio-activity, toxicity as well as structure activities of potential drugs. Our group has been ardently working in various areas of drug discovery to develop machine learning based predictive models using state-of-the-art algorithms. Using these predictive models, the group has been able to address significant questions like to identify potential antivirals, why do an IND candidate fail in clinical trials, permeability into blood-brain barrier as well as predict major toxicities like geno-toxicity, cyto-toxicity, renal-toxicity etc.
* Antiviral Prediction
Current pandemics propelled research efforts in unprecedented fashion, primarily triggering computational efforts towards new vaccine and drug development as well as drug repurposing. There is an urgent need to design novel drugs with targeted biological activity and minimum
adverse reactions that may be useful to manage viral outbreaks. Hence an attempt has been made to develop Machine Learning based predictive models that can be used to assess whether a compound has the potency to be antiviral or not. To this end, a set of 2358 antiviral compounds were compiled from the CAS COVID-19 antiviral SAR dataset whose activity was reported based on IC50 value. A total 1157 two-dimensional molecular descriptors were computed among which, the most highly correlated descriptors were selected using Tree-based, Correlation based and Mutual information-based feature selection methods. Seven Machine Learning algorithms i. e., Random
Forest, XGBoost, Support Vector Machine, KNN, Decision Tree, MLP Classifier and Logistic Regression were benchmarked. The best performance was achieved by the models developed using Random Forest and XGBoost algorithms in all the feature selection methods. The maximum predictive
accuracy of both these models was 88% with internal validation. Whereas, with an external dataset, a maximum accuracy of 93.10% for XGBoost and 100% for Random Forest based model was achievable. Furthermore, the study demonstrated scaffold analysis of the molecules as a
pragmatic approach to explore the importance of structurally diverse compounds in data driven studies.
* Assessing Failures in Clinical Trials
One of the major challenges in drug development is having acceptable levels of efficacy and safety throughout all the phases of clinical trials followed by the successful launch in the market. While there are many factors such as molecular properties, toxicity parameters, mechanism of action at the target site, etc. that regulates the therapeutic action of a compound, a holistic approach directed towards data-driven studies will invariably strengthen the predictive toxicological sciences. Our quest for the current study is to find out various reasons as to why an investigational candidate would fail in the clinical trials after multiple iterations of refinement and optimization. We have compiled a dataset that comprises of approved and withdrawn drugs as well as toxic compounds and essentially have used time-split based approach to generate the training and validation set. Five highly robust and scalable machine learning binary classifiers were used to develop the predictive models that were trained with features like molecular descriptors and fingerprints and then validated rigorously to achieve acceptable performance in terms of a set of performance metrics. The mean AUC scores for all the five classifiers with the hold-out test set were obtained in the range of 0.66–0.71. The models were further used to predict the probability score for the clinical candidate dataset. The top compounds predicted to be toxic were analyzed to estimate different dimensions of toxicity. Apparently, through this study, we propose that with the appropriate use of feature extraction and machine learning methods, one can estimate the likelihood of success or failure of
investigational drugs candidates thereby opening an avenue for future trends in computational toxicological studies.
* Predicting permeability into blood-brain barrier
The blood-brain barrier (BBB) is an important defence mechanism that restricts disease-causing pathogens and toxins to enter the brain from the bloodstream. In recent years, many in silico methods were proposed for predicting BBB permeability, however, the reliability of these models is questionable due to the smaller and class imbalance dataset which subsequently leads to a very high false positive rate. In our study, machine learning and deep learning-based predictive models were built using XGboost, Random Forest, Extra-tree classifiers and deep neural network. A dataset of 8153 compounds comprising both the BBB permeable and BBB non-permeable was curated and subjected to calculations of molecular descriptors and fingerprints for generating the features for
machine learning and deep learning models. Three balancing techniques were then applied to the dataset to address the class-imbalance issue. A comprehensive comparison among the models showed that the deep neural network model generated on the balanced MACCS fingerprint dataset outperformed with an accuracy of 97.8% and a ROC-AUC score of 0.98 among all the models. Additionally, a dynamic consensus model was prepared with the machine learning models and validated with a benchmark dataset for predicting BBB permeability with higher confidence scores.
* Predicting Toxicities
Renal toxicity prediction plays a vital role in drug discovery and clinical practice, as it helps to identify potentially harmful compounds and mitigate adverse effects on the renal system. Compound with inherent renal-toxic potential is one of the major concerns for drug development as it leads to failure in drug discovery. Predicting nephrotoxic probabilities of a compound at an early stage can be effective for reducing the drug failure rate. It is crucial to develop a mechanism to analyze the renal toxicity of a drug-candidate optimally and quickly. To mitigate the risks associated with renal toxicity, predictive models leveraging machine learning and deep learning techniques have gained significant attention. In one of our studies, 287 human renal-toxic drugs and 278 non-renal-toxic drugs were collected to develop a deep learning model and 27 machine learning models using 8 kinds of fingerprints and Rdkit descriptors. The deep neural network (DNN) model shows better generalization scores on five-fold cross-validation and Extra-tree model shows better performance score on test data. Structural alerts, specific chemical substructures associated with renal toxicity, offer a valuable tool for early toxicity assessment. Therefore, the substructures of renal toxic compounds were studied by applying association rule mining technique based on frequent itemset patterns.