The Catechol Benchmark : Time-series Solvent Selection Data for Few-shot Machine Learning

Boyne, Toby and Campos, Juan and Langdon, Becky and Qing, Jixiang and Xie, Yikin and Zhang, Shiqiang and Tsay, Calvin and Misener, Ruth and Davies, Daniel and Jelfs, Kim and Boyall, Sarah and Dixon, Thomas and Schrecker, Linden and Folch, Jose (2025) The Catechol Benchmark : Time-series Solvent Selection Data for Few-shot Machine Learning. In: NeurIPS 2025: The 39th Conference on Neural Information Processing Systems, 2025-11-28 - 2025-12-05, San Diego Convention Center..

Full text not available from this repository.

Abstract

Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retrosynthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.

Item Type:
Contribution to Conference (Poster)
Journal or Publication Title:
NeurIPS 2025: The 39th Conference on Neural Information Processing Systems
ID Code:
236062
Deposited By:
Deposited On:
25 Mar 2026 10:01
Refereed?:
Yes
Published?:
Published
Last Modified:
25 Mar 2026 23:15