Researchers at the Yale School of Medicine and Google have developed an artificial intelligence model that has identified a new potential treatment for cancer. The AI, known as Cell2Sentence, suggested the use of a drug called Silmitasertib to enhance the body’s ability to locate cancerous tumors. This discovery is notable because no prior research had indicated that Silmitasertib could be effective in this manner.
Professor David van Dijk, who leads the lab at Yale, expressed surprise at the AI’s suggestion, emphasizing that it was not based on existing literature. The model hypothesized that the drug could improve antigen presentation, which plays a crucial role in helping the immune system recognize and combat rogue cells. Following this hypothesis, van Dijk’s team conducted experiments using human skin and pulmonary cells, confirming the AI’s predictions.
The findings were published in a preprint paper in October 2024, resulting from a collaboration between Yale’s lab and Google’s top AI research divisions, Google DeepMind and Google Research. Van Dijk stated, “This is the first time that LLMs have been used to model gene expression and made a prediction about the effect of a drug that was then experimentally validated.”
Innovative Approaches to Cellular Biology
The research team has been exploring new methodologies for understanding human cells. By examining single-cell RNA sequences, they aim to measure gene activity within individual cells. Graduate student Syed Rizvi, a lead author of the October paper, explained that understanding genetic patterns would facilitate the identification of healthy versus unhealthy cells.
The initial phase involved using natural language processing, an AI technology that enables machines to comprehend human language. The vast numerical data from single-cell RNA sequences was challenging for traditional models to interpret. By converting this data into sentence-like structures, the researchers enabled the AI to identify biological patterns effectively.
For instance, a cellular sentence might consist of a series of gene labels such as “Gene A Gene B Gene C Gene D,” which corresponds to the approximately 18,000 gene types in their datasets. This innovative approach led to the creation of a model capable of consistently identifying cell types based on these sentences, a significant advancement in the field of cellular biology.
Initially, the team relied on the GPT-2 model, an older version of OpenAI’s language models, which has limited capacity compared to newer technologies. With only around 774 million parameters, it struggled to grasp the complex biological rules. The introduction of more advanced models could enhance their research outcomes.
Collaboration with Google Enhances Research Capacity
In 2024, a pivotal AI workshop hosted by Google at Yale facilitated the collaboration between the two teams. According to Shekoofeh Azizi, a scientist at Google DeepMind, this partnership emerged organically. The collaboration provided Yale’s researchers with access to extensive computing resources, reportedly the largest of any company globally, which includes resources equivalent to over a million GPUs compared to Yale’s six high-performance computing clusters.
Van Dijk noted the necessity of industry partnerships to scale research effectively. The collaboration allowed the team to transition from the GPT-2 model to Google’s advanced model, Gemma-2, expanding Cell2Sentence to 27 billion parameters and significantly increasing the data volume processed.
The enhanced capabilities of the large-language models enabled the researchers to predict drug effects on human cells, a process they termed “perturbation response prediction.” Utilizing this model, they identified Silmitasertib as a candidate drug to amplify immune signals in the presence of sickness-related proteins. Subsequent laboratory tests confirmed the AI’s hypothesis, demonstrating that large-language models can effectively conduct complex biological reasoning.
Azizi remarked that this form of hypothesis generation showcases the potential of large-scale AI systems in accelerating drug discovery processes.
Future Implications for Drug Development
The implications of this research extend beyond this immediate discovery. The developer of Silmitasertib, Senhwa Biosciences, is currently investigating this new application of its product. Van Dijk and Azizi anticipate that their findings could lead to faster and more cost-effective drug development in the medical field.
A study in 2024 estimated the average cost to develop a new drug, including failures, at around $500 million. Major pharmaceutical companies often invest billions in research annually, with many projects failing during pre-clinical trials. Van Dijk suggested that their AI model could streamline this phase, directing researchers toward experiments with higher success probabilities.
While the prospects are promising, scaling AI capabilities remains crucial. Van Dijk emphasized the importance of continued development, stating, “We got a clear signal that scaling matters.” Ultimately, the goal is to create a comprehensive AI model capable of simulating the human body, allowing for expedited drug testing without the ethical concerns associated with human trials.
Van Dijk likened this vision to achieving the “holy grail of drug development,” where researchers can efficiently test a multitude of drugs and identify those that are both effective and safe.
The lead authors of the October preprint included Yale’s Rizvi, postdoctoral associate Daniel Levine, Aakash Patel, Shiyang Zhang, Curtis Jamison Perry, and Google DeepMind’s Eric Wang. This groundbreaking work not only highlights the potential of AI in medicine but also underscores the value of interdisciplinary collaboration in addressing complex health challenges.