Insilico uses Microsoft’s BioGPT to find targets for aging and disease

Novel approach for predicting therapeutic targets using biomedicine-trained large language model discovered 9 potential aging targets.

Insilico Medicine, a clinical-stage generative AI-driven drug discovery company has announced that the company has used Microsoft BioGPT to identify targets against both the aging process and major age-related diseases.

Longevity.Technology: ChatGPT – the AI chatbot – can craft poems, write webcode and plan holidays. Large language models (LLMs) are the cornerstone of chatbots like GPT-4; trained on vast amounts of text data, they have been contributing to advances in diverse fields including literature, art and science – but their potential in the complex realms of biology and genomics has yet to be fully unlocked.

Insilico used the connection retrieval ability of Microsoft BioGPT to identify 9 potential dual-purpose targets against both the aging process and 14 major age-related diseases. Two of the proposed genes have not been previously correlated to the aging process, indicating the potential of Transformer models in novel target prediction and other ranking tasks across the biomedical field [1]. The findings were published in the journal Aging

According to recent publications, the majority of LLMs are trained on the continuation of texts, and work by suggesting the next word possible depending on the connection and probability distribution extracted from the context. Given a plausible prompt and adequate background data, scientists can now apply LLMs, especially specialized models, to the target prioritization process.

BioGPT, the domain-specific generative Transformer language model, was jointly proposed by Microsoft Research and Peking University in China. Pre-trained on millions of previously published biomedical research articles, the model outperformed previous models in multiple biomedical natural language processing tasks and demonstrated human parity in analyzing biomedical research to answer questions.

To further enhance the performance of BioGPT, Insilico researchers used a dataset of 900,000 grant proposals from the National Institutes of Health for training, and evaluated the effect through log fold change of enrichment (ELFC) and hypergeometric p-value (HGPV) scores. Next, the team established a target discovery pipeline including the prompt, retrieval probability of tokens, and gene probability calculation.

Using the final prompt sentence of “human gene targeted by a drug for treating {DISEASE} is the,” and the general tokenizer from BioGPT, the researchers proposed 9 potential targets after several cycles of probability retrieval. In the end, 5 targets were nominated as dual-purpose targets against aging and all 14 age-related diseases including Alzheimer’s disease, amyotrophic lateral sclerosis, and idiopathic pulmonary fibrosis. Both CCR5 and PTH are considered novel age-related targets.

“I am thrilled to see this breakthrough based on LLMs presented by the Insilico team, as it highlights the potential of a Transformer and generative AI approach combined with specific databases,” says Alex Zhavoronkov, PhD, founder and CEO of Insilico Medicine. “We hope to further accelerate drug R&D processes using our proprietary Pharma.AI platform in this era of biotech paradigm change.” 

“BioGPT can learn and understand large amounts of medical literature, thereby empowering practical processes including novel drug research and development, medical knowledge graph development, precision medicine, and medical dialogue assistance systems, and driving new biotechnology developments,” said Tao Qin, PhD, Senior Principal Researcher at Microsoft Research AI4Science.

“The research results released by Insilico Medicine shed light on new practical application scenarios for BioGPT and other LLM-based AI engines. We look forward to further real-world applications and more breakthroughs.”

A leader in generative AI for drug discovery, Insilico Medicine has established and validated its proprietary end-to-end Pharma.AI platform across target discovery, small molecule generation and clinical trial design. Recently, the company published the validation results of inClinico in Clinical Pharmacology and Therapeutics, where the Transformer-based clinical trial prediction tool achieved 79% accuracy in prospective validation [2].