Browse views: by Year, by Function, by GLF, by Subfunction, by Conference, by Journal

Chimera: Accurate retrosynthesis prediction by ensembling models with diverse inductive biases

Maziarz, Krzysztof, Liu, Guoqing, Misztela, Hubert, Kornev, Aleksei, Hoefling, Holger, Fortunato, Mike, Gupta, Rishi-1 and Segler, Marwin (2024) Chimera: Accurate retrosynthesis prediction by ensembling models with diverse inductive biases. arxiv.org.

Abstract

Planning and conducting chemical syntheses remains
a major bottleneck in the discovery of functional
small molecules, and prevents fully leveraging generative
AI for molecular inverse design. While early work
has shown that ML-based retrosynthesis models can
predict reasonable routes, their low accuracy for less
frequent, yet important reactions has been pointed
out. As multi-step search algorithms are typically
limited to reactions predicted as likely by the underlying
model, the applicability of those tools is inherently
constrained by the accuracy of the retrosynthesis
prediction model. Inspired by how chemists use
different strategies to ideate reactions, we propose
Chimera: a meta-framework for building highly accurate
reaction models that combine predictions from
diverse base models with complementary inductive biases
using a learning-based ensembling strategy. We
instanstiate the framework with two newly developed
components, which themselves are state of the art in
their categories. Through experiments across several
orders of magnitude in data scale and time-splits, we
show that Chimera outperforms all major models by a
significant margin, owing both to the good individual
performance of its constituents, but also to the scalability
of our ensembling strategy. Moreover, we find
that PhD-level organic chemists significantly prefer
routes by Chimera over all tested baselines in terms of
quality. Finally, we transfer the checkpoint trained on
the largest data scale to an internal reactions dataset
of a major pharmaceutical company, showing robust
generalization under distribution shift. With the new
dimension that our meta-framework introduces, we
anticipate further acceleration in the development of
even more accurate models.

Item Type: Article
Date Deposited: 16 Jan 2025 00:45
Last Modified: 16 Jan 2025 00:45
URI: https://oak.novartis.com/id/eprint/55725

Search