Catalyst design via machine learning relies directly on models. This study investigated to what extent those models and their output for catalyst design, i.e. the key catalyst properties, depend on the data characteristics such as volume, number of features, design space, correlation between features, and experimental uncertainty. Both an in silico and an experimental catalytic dataset were explored. All data characteristics proved to impact either the model performance and/or the most important catalyst properties recognized. Particularly, sufficiently large experimental datasets as well as uncorrelated and well-designed experimental variables (i.e. catalyst or operating conditions) are rare in literature, but not particularly difficult to obtain in practice. Acquiring new data with the inferred characteristics would better ensure sound models, even if case-specific investigations may also render existing data as useful.