IIBM & Society - iib.uam.es

Alfonso Valencia, an international reference in artificial intelligence applied to life sciences, gave a seminar at the Instituto de Investigaciones Biomédicas Sols-Morreale (IIBM), CSIC-UAM, titled "AI for Science: Digital Twins, Data and AI in Biomedicine." Alfonso Valencia is an ICREA Professor and director of the Barcelona National Supercomputing Center, where he leads research that combines artificial intelligence, massive data analysis, and biomedicine. In this interview, he reflects on the transformative moment computational biology is experiencing thanks to AI, the potential of digital twins to study diseases and accelerate scientific discovery, as well as the ethical, regulatory, and educational challenges that accompany these technologies. We believe his reflections are particularly interesting for the entire scientific community and, especially, for young researchers beginning their careers at the intersection of biology, data, and artificial intelligence.

Wolfgang Link, head of the Molecular Mechanisms of Aging and Cancer research group within the Cancer Program at the IIBM, interviewed Alfonso Valencia. Here is the interview:

Artificial intelligence is revolutionizing many areas of science. Where does the integration of AI in biology and biomedicine stand today?

I would say we are at a historic inflection point, moving from a "proof of concept" phase into one of "widespread and deep application," particularly in science and technology. For decades, bioinformatics was based on developing statistical and ML/AI models for biological problems; for example, the first publications on protein structure prediction with neural networks date back to the 1980s. The substantial change comes with the introduction of deep learning techniques, which represent a very significant technological leap. Obvious and relevant examples are:

Protein structure: AlphaFold2 and subsequent developments that have basically solved the central part of the protein structure prediction problem, as recognized by the awarding of the Nobel Prize to [Demis Hassabis and John M. Jumper].
Synthetic biology and protein design: In this same area of protein structure, methods like RFdiffusion or ProteinMPNN now allow us not only to predict but also to design proteins with new functions from scratch, expanding both practical and scientific possibilities on a scale difficult to imagine just a few years ago.
Foundation models in genomics: Models like Enformer or the Nucleotide Transformer learn the language of DNA in an unsupervised manner, capturing complex regulatory interactions that were impossible to model with previous methods, opening the doors to genome design.
Integrative omics data analysis: AI is key to integrating and making sense of the avalanche of single-cell transcriptomics, epigenomics, and proteomics data. For example, we can align gene expression profiles from millions of individual cells to build complete cellular atlases of model organisms and the human body, like those being generated by the Human Cell Atlas consortium, allowing the discovery of new cell types and transition states in diseases.
Medical imaging and digital pathology: AI is radically transforming diagnostic imaging. Deep learning models, especially deep convolutional neural networks (CNNs) and more recently transformer-based architectures, are reaching and in many cases surpassing the performance of human specialists in specific tasks, for example in radiology. In digital pathology, AI is enabling the identification of histological patterns, quantifying biomarkers, and combined with genomics data (digital pathology), predicting tumor evolution directly from tissue morphology.
The impact on scientific development is very real, although clinical translation is much slower within the regulatory framework for medical device development.

Following advances like AlphaFold and foundation models in biology, what do you think will be the next major leap in computational biology?

In my opinion, the next major leap in science in general, and especially in biology, will be the integration of mechanistic models with AI models. That is, systems where the AI's response is constrained by biological reality (data) and interpretation (causal models). This implies a limitation for AI to avoid responses that stray from the "real" environment, and simultaneously an enhancement of biological systems models with AI's construction and exploration capabilities.

In this sense, we are working on creating "virtual twins" of cellular systems. These models use as a knowledge base the developments in signaling systems, metabolism, or gene regulation based on knowledge and implemented in causal terms (Boolean networks, for example). These are complex, knowledge-based models that require significant effort to build and run on HPC infrastructures. These digital twins would allow us to conduct "in silico" experiments on tumor evolution, drug response, or simulating interactions between cell types.
The challenge is to make the development of these digital twins much easier by combining them with AI technology (particularly AI agents) to facilitate and improve all stages of the process: development, implementation, execution, monitoring, and analysis of results.

How is bioinformatics changing research into complex diseases like cancer or rare diseases?

In cancer: Bioinformatics, driven by AI, allows us to unravel tumor heterogeneity. By analyzing single-cell sequencing data, we can identify the different subpopulations of cells within a tumor, predict trajectories, and personalize responses to perturbations (drugs). These developments are ideally combined with in vitro systems such as cell lines or organoids.
In rare diseases: AI allows us to expand data by creating synthetic data equivalent to real data. We use this type of approach both to obtain sufficient data to train AI systems and to improve – and in some cases make possible – the interpretation of real data. The challenge, obviously, lies in ensuring the quality and variability of this new synthetic data.

The use of large AI models in biomedicine raises ethical and regulatory questions. What do you consider the most urgent ones, and what kind of solutions should we promote?

As a scientist, I am enthusiastic about the potential, but I am deeply concerned about the speed at which we are advancing without the proper handbrakes. The points I find most urgent are:

Bias and equity: Models are trained on real-world data (RWD), with its biases (most available genomes are of European descent, and most animal model data is on male mice). Furthermore, the algorithms themselves can introduce biases, as can their interpretation by third parties. If we generate models with these biases, we will perpetuate and amplify health inequalities.
Transparency and explainability: Scientists, but also clinicians and the regulatory system itself, need to know the reasons behind the results and ultimately the mechanistic explanation – the goal of biology is precisely this causal interpretation. Although there are algorithmic approaches (explainable AI methods - XAI), generative AI is intrinsically incapable of solving the causal problem, which depends on a chain of reasoning. Hence the importance of combining it with physical models.
Privacy: Foundation models must be trained with "legal" data and, in Europe, remain within the GDPR environment. Exporting patient data or data with intellectual property outside these limits is illegal and also causes considerable harm to European sovereignty.
Sovereignty: This is a fundamental associated problem. The main companies that are now gigantic are not European, which in the current geopolitical situation is a very serious problem. Incidentally, a problem intimately associated with processor development, which is also not in our hands.

Do you think AI will change how future scientists and doctors are trained?

Without a doubt, and it must. We often say that the professional of the future will not be the one who competes with AI, but the one who knows how to collaborate with it.

For doctors: Their training must include the fundamentals of the systems, the limitations, and the biases of these tools. A professional must know how to collaborate with the systems that will be in their environment since, as far as we know with current systems, this collaboration is more effective than the isolated work of the professional or the AI. If this is possible, the doctor-patient relationship will be enriched and become more effective.
For computational biologists: It will no longer be enough to know how to program and have some statistics; a stronger foundation in biology will be necessary to ask the right questions, critically interpret results, and be able to work in AI agent environments. In a certain sense, I think the computational biologist will continue to be a "translator" between the world of biology and the world of AI models, but with much greater development capacity.

If we look 10–15 years ahead, how do you imagine the computational biology lab of the future?

First, I would ask myself how many of the current experimental labs will be replaced by robotized systems directly connected to AI systems that will plan, execute experiments, and with the results, plan new experiments.

For young researchers interested in bioinformatics and AI applied to biology, what advice would you give them today?

I would give them three pieces of advice: one of hope, one of caution, and one of commitment.

Build a solid bridge between two worlds: Don't specialize too soon. A scientist with a solid foundation in biology and a good command of AI tools is much more valuable than an expert in transformers who doesn't know what a cell is.
Maintain healthy skepticism: AI is an incredible tool, but we are still in a very primitive phase. It's possible that nothing we use now will be useful in a few years with a new wave of technology. This is a time of possibilities but also of acceleration and uncertainty. Learn to question the results, their origin, and utility: Is the question important?; How will it be validated?; How will it be implemented?
Look towards Europe with a critical and constructive spirit: We live in a bittersweet moment. The talent in Europe is enormous, but the capacity to scale and compete with the big American tech companies is limited, due to a lack of venture capital and paralyzing bureaucracy. Don't be discouraged. We need a new generation that is not only technically excellent but also entrepreneurial and pushes to create our own industrial and research fabric, on which our future in Europe depends.

The IIBM would like to thank Prof. Alfonso Valencia for his visit, the seminar he delivered, and his willingness to share in this interview some reflections on the impact of artificial intelligence in biomedicine. We would especially like to highlight one of his statements: “The scientist of the future will be the one who knows how to collaborate with AI.”

Entrevista Alfonso Valencia

The AI Revolution in Biology: An Interview with Alfonso Valencia