In recent years, the integration of machine learning techniques into chemical reaction product prediction has opened new avenues for understanding and predicting the behaviour of chemical substances. The necessity for such predictive methods stems from the growing regulatory and social awareness of the environmental consequences associated with the persistence and accumulation of chemical residues. Traditional biodegradation prediction methods rely on expert knowledge to perform predictions. However, creating this expert knowledge is becoming increasingly prohibitive due to the complexity and diversity of newer datasets, leaving existing methods unable to perform predictions on these datasets. We formulate the product prediction problem as a sequence-to-sequence generation task and take inspiration from natural language processing and other reaction prediction tasks. In doing so, we reduce the need for the expensive manual creation of expert-based rules.
Scientific Contribution We contribute the first study of the transformer's ability to predict biodegradation reactions. Our proposed method can more accurately and efficiently predict biodegradation reactions on more compounds than existing methods. We also contribute a framework for evaluating transformer product prediction methods that can better illustrate the method's performance and is more suitable for comparison to other methods.