Abstract
In “Reaction classification and yield prediction using the differential reaction fingerprint DRFP”, we introduced a chemical reaction fingerprint based on the symmetric difference AΔB of two sets A and B. With DRFP, were present a reaction as the two sets R and P, where R contains the fragments of one or more reactants and P the fragments of one or more products. The SMILES strings of the fragments in the symmetric difference of fragments RΔP are then hashed and folded into a binary vector. We evaluated DRFP-trained models on high through put experiment data where it performed at least as well as DFT-based and learned fingerprints. In this commit, we present the evaluation of DRFP-trained XGBoost and Random Forest regressors on a recently released set of electronic laboratory notebook-extracted Buchwald-Hartwig reactions where it performs better than other methods by a wide margin. This result underlines the status of DRFP as a strong baseline for reaction representation and yield prediction.
| Original language | English |
|---|---|
| Pages (from-to) | 1988-1990 |
| Number of pages | 3 |
| Journal | Digital Discovery |
| Volume | 4 |
| Issue number | 8 |
| Early online date | 3 Jul 2025 |
| DOIs | |
| Publication status | Published - 1 Aug 2025 |