by Giuliana Miglierini
Generative AI is perhaps the more advanced form of artificial intelligence available today, as it is able to create new contents (texts, images, audio, video, objects, etc) based on data used to train it. Applications of generative AI are not limited to, for example, the famous ChatGPT chatbot used to write complex texts, or to algorithms producing incredible images.
Generative AI is becoming a new paradigm in drug discovery, as it promises to greatly reduce both time and costs to develop new molecules, or to repurpose already existing ones for new indications. A fundamental goal for pharmaceutical companies, given that the average cost of developing a new medicines is estimated at $2.6 billion.
Algorithms can be trained on chemical-physical characteristics and 3D shapes of molecules in order to generate completely new molecules of interest for a certain application, and/or to predict their behaviour in the biological context (e.g. binding to a specific receptor). We resume the current status of implementation of generative AI in the field of drug development.
Quintillions of data
It seems ages since the first full sequencing of the human genome was completed in year 2000. Since then, vast amounts of genomic and other biological data have rapidly accumulated. To give an idea, the National Human Genome Research Institute estimates between 2 and 40 exabytes (i.e. quintillions) of data available within the next decade. The number becomes even more larger when considering other domains relevant to drug development, including chemical structures and properties, complex biochemical pathways, 3D protein structures and receptors, data on the efficacy and toxicity profile of already approved medicines and candidates in the pipelines, etc.
No matter to say, the parallel growing interest in artificial intelligence that characterised the last twenty years has turned fundamental for the availability of new technologies able to digest, extract and analyse these extremely large datasets.
Machine learning and deep learning algorithms represented just the first step towards this goal. Generative AI came as a consequence, its birth is attributed to a paper by Ian Goodfellow et al., published in 2014.
Opportunities and challenges of generative AI for drug discovery
The implementation of generative AI in the pharmaceutical and medtech sectors may lead to the an estimated economic value of $60-110 billion/year, says the report by McKinsey and Co. “Generative AI in the pharmaceutical industry: Moving from hype to reality”.
More specifically, McKinsey analysed 63 generative AI use cases in life sciences, calculating the potential economic impact for different domains. The higher values ($18-30 bln) are expected for the commercial domain, followed by research and early discovery ($15-28 bln) and clinical development ($13-25 bln). Less impacted appear enterprise ($8-16 bln), operations ($4-7 bln) and medical affairs ($ 3-5 bln).
Implementation of generative AI may prove not a so easy exercise for pharma companies, as it has to fit within an already complex organisation and with the strict regulatory requirements typical of the pharmaceutical lifecycle. An important message comes from the analysis from McKinsey: it is of paramount importance to exit the hype climate surrounding generative AI and understand exactly what it can and cannot be done.
The question is highly complex to be solved, and it requires multiple skills (data scientists, researchers, medical affairs, legal, risk and business functions) jointly working to set up the solution more suited to each company. The availability of a proper data infrastructure is just the first step, the chosen generative AI model has to be adapted to the complexity of the specific case of use, focusing on key applications to avoid disruption of the business.
According to an analysis by Boston Consulting Group, generative AI may prove useful to include also unstructured data among those used as data sources by the pharmaceutical industry. Possibly a challenging goal to achieve, as data access and management must fulfil regulatory requirements, for example in relation to the possibility to use data generated in clinical trials to support regulatory approval.
Governance of generative AI must also reflect the key principles established in the EU for AI systems, i.e. they “must be ‘safe, transparent, traceable, non-discriminatory and environmentally friendly,’ as well as ‘overseen by people, rather than by automation, to prevent harmful outcomes’.”
The need to integrate generative AI with human activities would probably call companies to redesign core processes. To this instance, selection of the more suited AI infrastructure and platform may turn critical for success of the initiative. Integration with already existing AI tools and flexibility are among other main features to be kept in mind. Not less important is the choice of the right partners, that should fit with the strategic business goals.
Many algorithms already available
The first AI applications based on deep learning algorithms were used, for example, to predict the sequence and structure of complex biological molecules. It was the case of the AlphaFold Protein Structure Database, which contains over 200 million protein structure predictions freely available to the scientific community. Other algorithms of this kind are ESMFold (Evolutionary Scale Modeling) and and Microsoft’s MoLeR, specifically targeted to drug design.
A more recent generation of generative AI are IBM’s MoLFormers UI, a family of foundation models trained on chemicals which can deduce the structure of molecules from simple representations. MoLFormer-XL screening algorithm, for example, was trained on more than 1.1 billion unlabelled molecules from the PubChem and ZINC datasets, each represented according to the SMILES notation system (Simplified Molecular Input Line Entry System). As reported by IBM, MoLFormer-XL is able to predict many different physical, biophysical and physiological properties (e.g. the capacity to pass the blood-brain barrier), and even quantum properties.
Mutual Information Machine (MIM) learning is the approach used by NVIDIA to built its MolMIM algorithms, a probabilistic auto-encoder for small molecule drug discovery. The NVIDIA BioNeMo cloud service uses these models to deploy a generative AI platform to create molecules that, according to the company, should fulfil all properties and features required to exert the desired pharmacological activity.
Not only big players: many new companies were born specifically to support the creation of generative (often end-to-end) AI platforms for drug discovery. Among the main ones, Insilico Medicine’s Pharma.AI platform is being used to build a fully self-generated pipeline comprehensive of 31 programs and 29 targets. The more advance product under development targets the rare disease idiopathic pulmonary fibrosis and is currently in Phase 2 in the US and China. The company’s inClinico AI data-driven multimodal platform to calculate the probability of success of single clinical trials proved useful to predict outcomes of Phase 2 to Phase 3 trials and to recognise weak points in study design.
UK’s based Exscientia, founded in 2012, is an AI-driven precision medicine company. Among its main achievements is the creation of the first functional precision oncology platform to successfully guide treatment selection and improve patient outcomes. The more advanced product in its pipeline is GTAEXS617, an oncology product targeting CDK7 in advanced solid tumors.
These are just few main examples, you can learn more on companies focused on AI for drug discovery in these articles published on Forbes and Pharmaceutical Technologies.