Background: The CLV3 / ESR-RELATED ( CLE ) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development through cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity.
Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of the 12 amino acid residues from CLE motifs in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor.
Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
On 01 Oct, 2020
On 29 Sep, 2020
Received 16 Sep, 2020
On 15 Sep, 2020
On 14 Sep, 2020
Invitations sent on 14 Sep, 2020
On 13 Sep, 2020
On 13 Sep, 2020
Posted 22 Jan, 2020
On 21 Aug, 2020
Received 17 Aug, 2020
On 03 Aug, 2020
Received 08 Jun, 2020
On 09 Feb, 2020
Invitations sent on 06 Feb, 2020
On 14 Jan, 2020
On 13 Jan, 2020
On 13 Jan, 2020
On 11 Jan, 2020
On 01 Oct, 2020
On 29 Sep, 2020
Received 16 Sep, 2020
On 15 Sep, 2020
On 14 Sep, 2020
Invitations sent on 14 Sep, 2020
On 13 Sep, 2020
On 13 Sep, 2020
Posted 22 Jan, 2020
On 21 Aug, 2020
Received 17 Aug, 2020
On 03 Aug, 2020
Received 08 Jun, 2020
On 09 Feb, 2020
Invitations sent on 06 Feb, 2020
On 14 Jan, 2020
On 13 Jan, 2020
On 13 Jan, 2020
On 11 Jan, 2020
Background: The CLV3 / ESR-RELATED ( CLE ) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development through cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity.
Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of the 12 amino acid residues from CLE motifs in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor.
Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7
This is a list of supplementary files associated with this preprint. Click to download.
Loading...