Directed evolution (DE), a strategy for protein engineering, optimizes protein properties (i.e. fitness) by expensive and time-consuming screen or selection of a large combinatorial sequence space. Machine learning-assisted directed evolution (MLDE) that screens variant properties in silico can reduce the experimental burden. However, the MLDE utilizing small experimentally labeled training data from random sampling renders low global maximal fitness hitting rates. This work introduces a cluster learning-assisted directed evolution (CLADE) framework, particularly designed for systems without high-throughput screening assays, that combines sampling through hierarchical unsupervised clustering and supervised learning to guide protein engineering. Based on general biological information, CLADE splits the genetic combinatorial space into various subspaces with heterogeneous evolutionary traits, which guides the selection of experimental sampling sets and the subsequent building up of supervised learning training sets. By virtually screening two four-site combinatorial fitness landscapes from protein G domain B1 (GB1) and PhoQ, our CLADE consistently showed near 3-fold improvement on global maximal fitness hitting rate than using randomly sampled training data. Our CLADE can be easily applied to various biological systems and customized for systems with different throughput levels to maximize its accuracy and efficiency. It promises a significant impact to protein engineering.