The Magnificent Structures Problem: AI, the Exposome, and What We Still Don't Know

A report from the Exposome Global Forum, Sitges Spain. April 29, 2026.

I had the privilege of moderating a panel on AI and the exposome with six colleagues whose vantage points span the field: Mingliang (Thomas) Fang (Fudan), John House (NIEHS), Arjun (Raj) Manrai (Harvard), Heidi Hanson (Oak Ridge National Laboratory), Marc Chadeau Hyam (Imperial College, UK), Vasilis Vasiliou (Yale), and John Ioannidis (Stanford). We spent 70 minutes together in total.

The anchoring prompt was provocative:

How would you define an exposome investigation? What are its key characteristics, and what role does AI play in this frontier?

I wanted the panel to take the definition seriously before reaching for the tools. Twenty years in, the exposome field still argues about what counts as an exposome study, though the term recently has had a refreshing (ref Science 2021). AI, by lowering the cost of producing convincing-looking science, makes the definitional question urgent. What follows is my synthesis of where the panel landed, with the help of AI.

A working definition of the exposomic investigation

An encouraging surprise was how much the panel converged on what an exposome investigation actually is even arriving from very different starting points. Stitched together across opening statements, a common thread included that an exposome investigation considers exposures, cumulative, time-driven, multi-level, and integrative, concerned with how external factors get under the skin and shape health across a life course.

Vasilis Vasiliou pressed on rigor: he has seen too many papers claiming a “newborn exposome” from 200 metabolites or two weeks of wearable data. That is not the exposome, in his definition. Before we ask how AI advances the field, we have to be clear about what we are advancing — translation to phenotype, mechanism, causal inference, and the dynamic interaction between exposome and genome.

Marc Chadeau-Hyam operationalized the definition through its methodological signatures: high-dimensional, multi-block data; heterogeneous external and internal measurements; longitudinal structure; and a demand for embodiment and integration. He was clear that dimensionality is not the only complexity — proteomic panels with relatively few analytes can be just as hard to interpret as million-feature mass spectra. Dr Chadeau-Hyam expressed doubts on AI automating any pipeline from start (collection of data) to finish (prodiction of a paper).

Heidi Hanson offered a useful reframe. Until recently she had thought of the exposome as “the omic for the environment.” That definition kept breaking. What we are actually describing, she argued, is a system, one with time, multiple exposures and behaviors, causal ordering, and feedback. Treating it as a system rather than a parallel omic changes what kinds of analyses make sense, and importantly, makes AI a much better fit for the job.

John House emphasized that ExWAS is not just environmental epidemiology at high dimension, or a toxicology study with more endpoints, or a study with extra modalities bolted on. It is a study designed to characterize the cumulative, time-varying, multi-level environmental experience of individuals or populations, to connect that experience to biological response and health, and to subject the resulting claims to replication, mechanistic interrogation, and eventual action for human health.

John Ioannidis kept the definition open, including community- and cluster-level exposures, and reminded us that we may not yet know what actually matters. Maybe food frequency questionnaires and even our wearables are “prehistoric tools telling us nothing.” Maybe the molecules on the chair he was sitting on are worth a career of study. We do not yet know.

Where AI is genuinely doing new work

The panelists were not skeptics. They use these tools, and several argued that not using them now puts a researcher at a real disadvantage. Where AI is doing real scientific work:

Analytical chemistry. Mingliang made a strong case. Retention time prediction, MS/MS fragmentation prediction, identification of unknown chemicals at the bottom of samples the underlying chemistry is reproducible enough that these predictions may work.

Integration and dimensionality reduction. John House and Marc both pointed to AI’s capacity to handle heterogeneous data streams at scales humans cannot easily grasp, and to surface high-quality candidates for causal follow-up.

System-level learning. Heidi made the case that if we accept the exposome as a system, AI is uniquely suited to learning it. The most compelling-near term target may not be internal physiological twins, but population-scale digital twins: representations of environments we could perturb to make them healthier become a realistic target over the next 5 to 10 years. How should these be evaluated?

Productivity compression. Heidi noted that AI has compressed her coding work from six months down to a couple of weeks. Raj went further: reflecting on a Nature Medicine paper with our group that took nearly a decade to produce, he argued that today’s tools could have substantially automated both the analysis and the writing. Not by typing “give me a Nature Medicine paper,” but by directing modern models with deep domain expertise. A year ago this would have been science fiction.

Where AI fails, and how it fails

This is where the panel’s energy concentrated, and where the most important contribution of the session lies.

Study design and causal inference are not solved. John House said it plainly: AI generates excellent candidates, but it does not substitute for negative controls, triangulation, and external validation. Vasilis was emphatic — he tells his students and postdocs never to use AI to design a study. Tell AI what to check; never let it decide what to ask.

Pattern overproduction. John Ioannidis offered an arresting image of the panel. He and a colleague tried AI on a pattern recognition problem last night. Within minutes they had dozens of recognized patterns, each different. “An hour ago we couldn’t recognize anything; now we can write a thousand papers, maybe a million.” Theory, priors, mechanism, and causal inference have to survive this, or we will produce an enormous volume of fancy, irreproducible, meaningless research.

Bias propagation. Every algorithm encodes the biases of its training. If we are not careful about what AI is built on, it will distort whatever the rest of us are trying to say. John House also made the point that public commercial AI tools may have cost functions implemented that are not ideal for good science (e.g. spend a minimum amount of energy(tokens?) to make the user feel good about the response instead of being exhaustive and thorough for a given request).

Sycophantic confidence in misleading output. Raj’s framing, which became the panel’s anchor metaphor: borrowed from a remark John Ioannidis had made earlier in the day: modern AI is “wonderful at building magnificent structures out of thin air.” More convincingly than any tool in the past 20 or 30 years. The convincingness is the problem.

The “unbelievably-wrong” case. John Ioannidis pushed this further than anyone else dared. He has seen colleagues venture into AI on exposome-style questions and produce outputs that were “more wrong than anything I could imagine.” The smartest people, the best technology, the richest datasets, and conclusions that were unbelievably, incapacitatingly wrong. He did not dismiss the possibility that he was the one missing something.

Skill atrophy or powerup? Marc told the panel about students who used AI to write scripts to analyze simulated omics data. The scripts looked fine. When the team tried to run them on real data, there were bugs that would have taken months to find — and the students could no longer debug their own work. The tool had displaced the skill that would let them recognize what the tool had done wrong. Mingliang pushed back – he mentioned that we must start somewhere, and AI models have helped level the skills playing field for his trainees who might have not had access to training.

Data quality and imbalance, especially in toxicology. Mingliang reminded us that computational toxicology, while mature in pharma, is still emerging in environmental science. Endpoints like AhR receptor activity have abundant data; most others do not. AI does not solve a data problem; it inherits one.

Education and incentives — where the panel disagreed

This was the one place where the panel did not fully converge, and the disagreement was productive.

Raj Manrai argued that our existing training paradigm largely works. Who wins in the pre-AI system? The winners are scientists who develop deep domain expertise alongside computational fluency and that pairing is exactly what lets you tell sycophantic AI output from value-additive work. Committees on committees about “AI in the curriculum” will not figure this out. Continued emphasis on mechanism, pathophysiology, and the multi-level structure of the science is what builds the taste required to wield these tools well.

John Ioannidis pushed back. Education has been inefficient for a long time; much of what we teach is wasted. We are not well prepared for the onslaught of new options, and academia is not where the frontier capability is being built — no university has even one percent of the compute of the big tech companies driving these models. Beyond education, he argued that incentives are the deeper lever. If we continue rewarding salami-sliced papers and maximized CV length, AI will simply accelerate the existing pathologies. Intent sits behind use, and intent responds to incentives.

Marc landed somewhere in between. These models are a tool, not a culture. They look like a friend that says nice things, but they are not your friend. The community’s job is to understand how the models work well enough to critically appraise their output — to know what they were trained on and what they have digested before deciding what to do with what they say. And, crucially: use AI to ask questions about your specific research, never to ask questions in general.

Vasilis added the cleanest framing of the educational task: go back to definitions first. Define the exposome, define the disease, define the mechanism, define the translation. Then reach for AI. The tool should follow the question, not produce it.

The sharpest lines of the panel

I will give the closing words to John Ioannidis: the smartest people, the best AI, and the richest datasets will produce outputs more wrong than anything else: false positive soup.

That sentence captures the central tension the field now faces. The tools have raised the floor such that anyone can now produce a credible-looking exposome paper, but they have not yet raised the ceiling, where causal reasoning, mechanism, and translation live. Worse, by making the floor look like the ceiling, they put real pressure on the very things that distinguish good exposome science from bad: deep measurement, deep design, deep domain knowledge.

The encouraging news from the panel is that we know what the calibration layer needs to contain, and this is not reserved to exposomics, but all of biomedical sciences. Definitions are important. Mechanism before pattern. Training that produces both computational fluency and domain depth, rather than one without the other. Incentives that reward reproducibility and translation, not just throughput.

The exposome is finally taking shape as a coherent scientific category: a system, cumulative and multi-level, that links external environment to human health across a lifetime. AI is probably going to be central to investigating it. Whether that investigation actually saves lives, or just generates magnificent structures from thin air, is up to us.