Predector: an automated and combinative method for the predictive ranking of candidate effector proteins of fungal plant-pathogens


 ‘Effectors’ are a broad class of cytotoxic or virulence-promoting molecules that are released from plant-pathogen cells to cause disease in their host. Fungal effectors are a core research area for improving host disease resistance; however, because they generally lack common distinguishing features or obvious sequence similarity, discovery of effectors remains a major challenge. This study presents a novel tool and pipeline for effector prediction - Predector - which interfaces with multiple software tools and methods, aggregates disparate features that are relevant to fungal effector proteins, and ranks effector candidate proteins using a pairwise learning to rank approach. Predector outperformed alternative effector prediction methods that were applied to a curated set of confirmed effectors derived from multiple species. We present Predector as a useful tool for the prediction and ranking of effector candidates, which aggregates and reports additional supporting information relevant to effector and secretome prediction in a simple, efficient, and reproducible manner. Predector is available from https://github.com/ccdmb/predector and associated data from https://github.com/ccdmb/predector-data.


Introduction
'Effectors' are a broad class of cytotoxic or virulence-promoting molecules that are released from pathogen cells to cause disease in their host. Fungal effectors are a core research area for improving host disease resistance; however, because they generally lack common features or obvious sequence similarity, discovery of effectors is non-trivial [1][2][3] . Secreted effector proteins of plant pathogens have been studied more comprehensively in the Oomycetes (a separate lineage of lamentous microbes), in which in silico identi cation of effectors is more feasible compared to fungi as they exhibit highly conserved sequence motifs (e.g. RXLR, LXLFLAK) 4,5 . Effector prediction in fungal genomes may be more challenging as they are highly plastic, commonly exhibiting accelerated mutation rates, fungal-speci c genome-wide mutagenesis mechanisms e.g. repeat-induced point mutation (RIP) 6,7 , as well as increased rates of chromosome structure rearrangement 8,9 and lateral gene transfer 10 . Consequently, fungal effectors are highly diverse in sequence and function, and much effector candidate discovery is performed using experimental techniques such as phenotype association and comparative genomics [11][12][13][14] , transcriptomics [15][16][17] , proteomics 18,19 and GWAS 20,21 . There are, however, some protein characteristics -i.e. structural features (e.g. functional domains), signal peptides, amino-acid frequencies -that can be used as an alternative to simple homology searches. Several methods using these characteristics have been developed to prioritise effector candidates for experimental validation 2 .
In-silico effector prediction of small-secreted proteins (SSPs) has typically involved ad hoc hard-set criteria such as a signal peptide, no transmembrane domains outside the signal peptide, small overall size (often < 300AA), and a high number of cysteine amino-acids. These thresholds were based on the properties of early discovered effectors; however, numerous known effectors do not conform to this pro le (Supplementary Table 1). The use of simple hard lters risks excluding these proteins from candidacy. Signal peptide prediction is the most common in-silico technique used to re ne effector candidates from proteomes 22 , with SignalP the most common prediction tool [23][24][25] although other tools are frequently used in combination 26,27 , and different tools can perform better or worse with different protein groups or organisms 22 . Subcellular localisation prediction tools such as TargetP 23 or DeepLoc 28 are also frequently used to predict the location of proteins. Their reliability for predicting protein secretion is questionable 22 , but proteins predicted to be localised in organelles might reasonably be excluded.
Because most effectors are expected to be free in the extracellular space or host cells, transmembrane domains (TM) are also an important feature for excluding candidates, commonly predicted using TMHMM 29 or Phobius 26 .
Recently developed machine learning tools tailored to predicting effector-like properties have presented new opportunities for improving effector prediction pipelines. EffectorP 30,31 and FunEffector-Pred 32 use amino acid frequencies, molecular weight, charge, AA k-mers, and other protein characteristics to predict effector-like proteins directly. In combination with secretion prediction, tools like EffectorP and FunEffector-Pred may be a more robust alternative to simple hard lters. LOCALIZER 33 39 , are growing in their relevance. Secondary and tertiary structural modelling and similarity searches against known effectors are not commonly used for high-throughput effector discovery, but this could yet become an important component of future effector prediction pipelines 2 .
Current effector prediction pipelines face two major challenges: 1) the necessity of reducing 10-20 thousand proteins per genome down to a set of effector candidates that is both reliable and within a number that is feasible for experimental validation, and 2) the amalgamation of outputs from a large and diverse range of bioinformatics tools and methods, for both prediction and informative purposes. Fungal genome datasets typically contain thousands of secreted proteins, of which hundreds of SSPs may be predicted by standard methods 2 . Further ltering or ranking based on supporting data from GWAS, RNAseq, positive selection, or comparative genomics can still generate hundreds of candidates [40][41][42][43] . The prioritisation of effector candidates based on simple biochemical properties is, therefore, still relevant to effector prediction. Furthermore, there is little consensus on how to combine multiple analyses 22 , and the common use of multiple successive hard lters risks increasing the error with each step, causing good candidates to be excluded.
Rank-based methods are a simple way to avoid exclusion of candidates lacking clearly discriminative features, via assigning weighted scores to features that are presumed to be important in determining effector-likelihood, and summing these into a single score that is used to rank candidates 43 . However, these simple combinations of manually assigned feature weights may still fail to place proteins with uncommon characteristics near the top of the list. More sophisticated ranking decisions may come from a group of machine learning techniques called "learn to rank". Rather than offering a binary classi cation (i.e. effector or non-effector), these methods attempt to order elements optimally so that relevant elements are nearer the beginning of the list. Although these algorithms are most often employed in search engine and e-commerce websites, they have been used successfully to combine diverse sources of information and rank protein structure predictions 44 , remote homology predictions 45 , gene ontology term assignments 46 , and predicting protein-phenotype associations in human disease 47 .
In this study, we present a novel tool and pipeline for effector prediction -Predector -which interfaces with multiple software tools and methods, aggregates disparate features that are relevant to fungal effector proteins, and ranks effector candidate proteins using a pairwise learning to rank approach. Predector simpli es effector prediction work ows by providing simpli ed software dependency installation, a standardised pipeline that can be run e ciently on both commodity hardware and supercomputers, and user friendly tabular formatted results. In this study, we compare the performance of Predector against a typical effector prediction method (i.e. signal peptide prediction, transmembrane domain prediction, and EffectorP), on a curated set of con rmed effectors derived from multiple species. While the small number of currently known effectors and relatively loose de nition of the group precludes the possibility of perfectly precise effector prediction tools, we present Predector as a tool enabling useful effector candidate ranks alongside supporting information for effector and secretome prediction in a simple, e cient, and reproducible manner.

Results
To develop and evaluate the predector pipeline, a dataset of unprocessed fungal proteins was collected and split into train and test datasets (Supplementary Table 2). The datasets included redundancy reduced proteins of known fungal effectors (train: 125, test: 28), fungal proteins in the SwissProt database annotated as secreted (train: 256, test: 64) and non-secreted (train: 8676, test: 2169), and the whole proteomes from 10 well studied fungal genomes (train: 52224, test: 13056). The predector pipeline runs numerous tools related to effector and secretome prediction (Table 3). Benchmarking those tools against the set of con rmed effector proteins in the train dataset, it was observed that the secretion prediction tools were frequently correct with a small number of exceptions (Fig. 1). Signal peptide prediction recall in the training dataset of known effectors ranged from 84% (DeepSig) to 92% (TargetP 2). SignalP 3, 4, 5, and Phobius generally predicted about 90% of effectors to have signal peptides ( Fig. 1). Transmembrane (TM) predictors were, as expected, generally not able to predict TM domains in con rmed effectors, with the few single TM predictions by TMHMM or Phobius likely to be mispredictions within N-terminal signal peptides. In the case of TMHMM, all effectors with at least one TM domain had more than ten AAs predicted to be TM associated in the rst 60 residues by TMHMM (Supplementary File 1:39). Effector prediction tools (EffectorP 1 and 2) were also able to predict most, but not all, of the con rmed effector set. EffectorP correctly predicted 85.6% and 76.8% of effectors in the training dataset for versions 1 and 2, respectively. Evaluation of protein features that might allow for distinction between the different protein classes considered in this study (effectors, effector homologues, secreted proteins, non-secreted proteins, and unlabelled proteomes) identi ed twelve features that could be used effectively. These included: the proportion of cysteines, small, non-polar, charged, acidic, and basic amino acids; ApoplastP prediction; DeepLoc extracellular or membrane predicted localisations; molecular weight; EffectorP scores, and signal peptide raw scores (see Supplementary File 1:3-40).  To incorporate information from the selected features related to effector and secretion prediction, a pairwise learning to rank model was trained. The mean cross validated normalised discount cumulative gain (NDCG) in the top 500 ranked predictions (NDCG@500) for the hyper-parameter optimised model was 0.925942 with standard deviation 0.009421, indicating high performance and little effect of substructure within the dataset. The mean NDCG@500 for the train sets within the cross validation was 0.886542 (0.015099), indicating that the model was not over tting.
Benchmarked against a test set of con rmed effectors (Fig. 2), the Predector model consistently gave higher scores to effector proteins, and also to homologues of con rmed effectors (those on which the model was not trained). Secreted proteins from SwissProt tended to have intermediate scores centred around 0. Non-secreted and the unlabelled effectors were heavily skewed towards more negative scores, with a long tail that included some proteins with high scores (which in the case of proteomes was expected as this dataset was unlabelled). The test and train sets showed similar distributions of scores, though there tended to be slightly lower scores for known effectors in the test set.
The main features used for sorting effectors from non-effectors in the Predector model were TargetP secretion prediction, SignalP 3-HMM S-scores, SignalP4 D-scores, DeepLoc extracellular and membrane predictions, and EffectorP 1 and 2. TargetP secretion was overwhelmingly the most important feature according to the gain metric (the average increase in predictive score when the feature is used), which was consistent with the observation that it was the most sensitive of the signal peptide prediction methods for effectors (Fig. 1). The most commonly used predictors were EffectorP 2 pseudoprobabilities, molecular weight, and the proportions of cysteines, basic AAs, non-polar AAs and tiny AAs. Feature importance and boosted trees indicated overall that the Predector model rst coarsely sorts proteins into the predicted secretome and non-secreted proteins, then proceeds to separate proteins with effector-like properties from the remainder of the secretome using more decision nodes each with smaller overall gain (Supplementary File 1:43).
Predector separated some proteins predicted to be secreted (i.e. with a signal peptide and fewer than two TM domains), from those that are not (Fig. 3). Most "non-secreted" proteins have a score < 0, while a trimodal distribution of "secreted proteins" was observed, which spanned the full range of scores and roughly coincided with the distributions of effectors/homologues, SwissProt secreted and the nonsecreted/proteome datasets (Fig. 2). This contrasted with EffectorP predictions (which was trained and is intended to be used on secretomes only), which gave poor separation of non-secreted and secreted proteins. EffectorP 1 showed a high bias to predicting proteins as either 0 or 1, indicating that it may be unsuited for ranking and should only be used as a decision classi er with a score threshold of 0.5. EffectorP 2 showed a more continuous separation of known effectors, and was moderately correlated with Predector scores for secreted proteins.
Predector consistently outperformed EffectorP 1 and 2 (restricted to the predicted secretome, as per intended usage) in classi cation recall and Matthews correlation coe cient, and in metrics assessing the ranked order of effector candidates (Table 1, Supplementary Table 4). While EffectorP was not optimised for effector candidate ranking or intended to be used this way, we note that its probability score can often be mis-used for this purpose. Conversely, although Predector was not intended to be used for effector classi cation, we also compared its predictive performance with EffectorP 1 and 2 on the secreted subset, and on the full dataset using the joint estimator of secretion and EffectorP score > 0.5. For the purpose of this comparison, a minimum Predector score of 0 was selected as a classi cation threshold based on the observation that the model assigns positive scores to effector associated branches in the trees (and negative scores to non-effector associated branches). EffectorP 1 and 2 performed identically in terms of effector classi cation on our test dataset, and gave highly similar results on the training dataset (Supplementary Table 4, Supplementary File 1:50), although fewer false positives were reported by EffectorP 2. Predector correctly predicted all but two effectors in the full test set, and all but one in the secreted test subset. In contrast, EffectorP 1 and 2 both mis-classi ed six effectors in the secreted subset, and two known effectors in the test dataset were not predicted to be secreted thus would have been excluded from prediction by an EffectorP pipeline. Predector also correctly predicted a con rmed effector (AvrSr35) that was not predicted to be secreted as an effector. Although Predector, not being optimised for classi cation, had a higher false positive rate than EffectorP 1 and 2, it compared favourably for the MCC metric which is considered more reliable for unbalanced datasets 48 . It is worth noting that in this study secretion prediction incorporates multiple methods, whereas many studies rely on a single prediction tool, thus the proportion of potentially missed effector candidates may be higher than we report here. CPUs. DeepLoc was overwhelmingly the longest running task (~ 75 mins for 5000 proteins), with most of the 16 CPUs to idle while waiting for DeepLoc to nish.

Discussion
The Predector pipeline unites, for the rst time, numerous computational tasks commonly involved in effector and secretion prediction to determine a ranked set of candidate effectors from unprocessed (immature) proteins, simplifying complex data gathering steps. The effector ranking model run as part of Predector provides additional bene ts over the standalone use of its composite tools, in combining their individual strengths while being less prone to their weaknesses. It was observed that while the most recently updated effector prediction tool available -EffectorP 2 30 -performed well as a very speci c classi er, it still missed several con rmed effectors. The preliminary step of secretion prediction can also be error prone, and the combined false positives from both effector and secretion prediction methods, coupled with their common implementation as hard lters, may result in many genuine candidate effectors being discarded. For this reason, we propose that ranking and clustering methods should be preferred over hard lters for prioritising effector candidates.
In terms of effector candidate ranking, EffectorP 2 performed reasonably well for ordering con rmed effectors based on probability score, but was not designed to be used in this way. Predector maintained higher recall with higher scores (Supplementary File 1:46 & 47, Table 1) and achieved comparable or better precision than EffectorP 2 alone for higher effector scores. Thus, while Predector is not intended to be used as a classi er, we demonstrate its utility as a highly sensitive method for combined secretion and effector prediction, and suggest a decision threshold (score) of 0 for summarisation purposes alongside standard EffectorP and secretion classi ers (which can be obtained from Predector output). However, the appropriate threshold may change with future versions. Although the recall scores for Predector were very high, Predector also predicted 292 more false positives in the test dataset than the commonly used method of combining a predicted secretion hard lter with EffectorP 2 (Table 1). We argue that recall should be prioritised for effector prediction, as the unlabelled proteome datasets used here may contain genuine novel effectors, and the focus of Predector on ranking rather than classi cation mitigates some of the issues associated with lower precision. Encouragingly, we observed that Predector was capable of giving positive scores to known effectors which were not predicted to have a signal peptide (in both the train and test datasets) and thus would have failed to be predicted by alternate methods with a secretion prediction hard lter.
The predictive rankings provided by Predector are complemented with additional information that can be used to manually evaluate groups of effector candidates, and represents a comprehensive summary of various predicted types of proteins within a fungal proteome dataset, including candidate pathogenicity effectors, effector homologues, predicted secreted proteins, and carbohydrate-active enzymes (CAZymes) 49 . Predector reports the results of database searches against PHI-base, a curated set of known fungal effectors, Pfam domains, and dbCAN HMMs. We recommend that users examine the functionally annotated candidates closely, particularly with respect to homologues of con rmed effectors, prior to consideration of candidates ranked by Predector scores. Similarly, supplementation with experimental evidence or information derived from external tools and pipelines will further improve the utility of the Predector outputs, e.g. selection pro les derived from pan-genome comparisons 43,50 , presence-absence pro les in comparative genomics, genome wide association studies, differential gene expression, or pathogenicity-relevant information relating to the genomic landscape: the distance to a DNA repeats, telomeres or distal regions of assembled sequences 8,51 ; or codon adaptation. By selecting indicators of general effector properties or molecular interactions of interest, and sorting these lists rst by those functionally-guided features and then by Predector score(s), users gain a rich and clear guide for prioritising candidates before proceeding to more resource-expensive experiments (e.g. cloning or structure modelling).
Among known effectors there is considerable diversity of their molecular roles and functions. The modern plant pathology community has yet to come to rm agreement on the broad de nition of an effector beyond the gene-for-gene and inverse gene-for-gene models, or to re ne a broader de nition with effector sub-types. Effectors may promote virulence through directly targeting and disrupting host cell biological processes, including ribogenesis, photosynthesis or mitochondrial activity. In contrast various extracellular chitin-binding proteins have also long been described as effectors, yet promote virulence through passively protecting the pathogen cell from host PAMP and DAMP recognition. CAZymes are not typically considered to act as effectors, yet there are several examples of secreted CAZymes that are reported as virulence factors or may be recognised by host major resistance genes 38 . Furthermore, the focus of many effector prediction methods (including Predector) on biochemical or functional aspects of effector proteins also neglects the crucial contribution of host R-and S-proteins in gene-for-gene interactions, which must be determined experimentally. An inclusive predictive model spanning diverse effector types may not offer a reliable pathway to rapid effector identi cation, rather they are likely to focus on general biochemical properties unrelated to necrotrophic or avirulence activities, e.g. that would enable the majority to interact with membranes and translocate into a host cell or to function in the apoplast. We present Predector as a reasonable compromise between functional diversity and common purpose, accounting for this inherent diversity through incorporation of multiple predictive methods. Additionally, with rapidly decreasing costs of genome sequencing and improvements to the automation of genome analysis and gene feature annotation, the availability and utility of fungal pathogen genomes is steadily increasing 52 . There is a growing need for tools which will minimise the effects of poor data quality control and ensure reproducibility and comparability across multiple genome resources. The Predector pipeline is an important time-saving tool which applies a standardised and reproducible set of tests for effector prediction

Pipeline implementation
The Predector pipeline runs a range of commonly used effector and secretome prediction bioinformatics tools for complete predicted proteome, accepted as input in FASTA formatted les (Table 3), and combines all raw and summarised outputs into newline-delimited JSON, tab-delimited text and GFF3 formats. The pipeline is implemented in Next ow (version >20) 53 , and a conda environment and Docker container are available for easy installation of dependencies, with scripts to integrate user-downloaded proprietary software into these environments. Predector is available from https://github.com/ccdmb/predector.

Datasets
The training and evaluation datasets consisted of: con rmed fungal effectors, fungal proteins with con rmed subcellular localisation, and an 'unlabelled' fungal protein set derived from whole proteomes of well-annotated, model fungal species. The experimentally-con rmed effector protein dataset was curated from literature, PHI-base 38 , and EffectorP 30,54 training datasets (Supplementary Table 2). Effector homologues were also identi ed from literature (Supplementary Table 2 Fungal proteins with experimentally annotated subcellular localisation were downloaded from UniProtKB/SwissProt (version 2020_01, Downloaded 2020-06-01), and were labelled "secreted" (nontransmembrane) or "non-secreted" (membrane associated, endoplasmic reticulum localised, golgi localised, and Glycosylphosphatidylinositol (GPI) anchored). UniProtKB download queries are provided in Supplementary and saprotroph (or opportunistic monomertroph/biotroph) Neurospora crassa 71 (Supplementary Table  2). Fourteen of the 24 proteomes above were retained as a separate dataset for nal evaluation (Supplementary Table 2). The remainder of the datasets were combined, and redundant sequences were removed to prevent the undue in uence of conserved or well studied sequences with multiple records.
Redundancy was reduced by clustering proteins with MMSeqs2 version 11-e1a1c 55 requiring a minimum reciprocal coverage of 70% and minimum sequence identity of 30% (--min-seq-id 0.3 --cov-mode 0 -c 0.7 -cluster-mode 0). A single sequence was chosen to represent a set of clustered, redundant sequences, which was prioritised based on supporting information (in order of preference): known effector, SwissProt secreted, SwissProt non-secreted, proteome/effector homologue, longest member of cluster. Clusters that corresponded to the known effectors from the EffectorP 2 30 training and test data sets were automatically assigned to training and test data sets in this study. A randomly selected subset of 20% of the remaining representative members of clusters were also assigned to the test dataset. Data and scripts for generating the datasets are available at https://github.com/ccdmb/predector-data.
Manual effector and secretion prediction scoring Predicted proteins were ranked using the sum of several weight-adjusted scores derived from a range of software and methods (Table 3, Supplementary Table 3). Proteins were annotated as "multiple_transmembrane" if it was assigned more than one transmembrane (TM) domains by either TMHMM or Phobius, and "single_transmembrane" if it was assigned one TM domain by TMHMM or Phobius (but neither had more than one). For TMHMM "single_transmembrane" we add the additional constraint that if there is a signal peptide prediction (by any method), the number of expected TM AAs in the rst 60 residues is less than ten. A protein was annotated as "secreted" if it was predicted to have a signal peptide by any method and was not annotated as a multiple transmembrane protein.
Protein matches to PHI-base were summarised based on the experimental phenotypes of the matched proteins. Proteins were marked as a "phibase_effector_match" if they had any matches with the "Loss of pathogenicity", "Increased virulence (Hypervirulence)", or "Effector (plant avirulence determinant)" phenotypes; as a "phibase_virulence_match" if they had any matches with the "Reduced virulence" phenotype and not any of the effector phenotypes; and as a "phibase_lethal_match" if they had any matches with the "Lethal" phenotype. Proteins were also labelled as "effector_match", "pfam_match", or "dbcan_match" if they had a signi cant match to a custom database of known effectors, selected virulence associated Pfam HMMs, or selected virulence associated dbCAN HMMs, respectively (Supplementary Table 2).
Each protein was given two manually designed scores to evaluate effector or secreted protein candidates based on the values and weights in Supplementary Table 3. The secretion score is the sum of the products of value and weight for transmembrane, secreted, signalp3_hmm, signalp3_nn, phobius, signalp4, deepsig, targetp, and deeploc parameters. The effector score is the sum of the secretion score and the sum of the products of EffectorP, and the homology parameters (effector match, virulence match, and lethal match) values and weights.
Learning to rank model training A "learning to rank" pairwise machine learning method based on LambdaMart 72 was developed using XGBoost 73 to prioritise effectors. Effector homologues in the training data set were held out as an informal validation set, known effector proteins were considered relevant (priority 2), and all other proteins in the train dataset were considered irrelevant (priority 1). To mitigate issues caused by training. The nal model was trained using the optimised hyper-parameters.
Model and score evaluation The learning to rank model, manually designed scores, and EffectorP pseudo-probabilities were evaluated using rank summarisation statistics using the scikit-learn library 75 , which included the coverage error (the rank of the lowest scoring effector), label ranking average precision (LRAP; average proportion of correctly labelled samples with a lower score than each position in the sorted results), the label ranking loss (the average number of results that are incorrectly ordered), and the normalised discount cumulative gain (NDCG; the sum of all ranking priorities divided by the log 2 of the rank position in the sorted list (DCG), normalised by the best theoretically possible DCG score) 74 . NDCG, LRAP, and label ranking loss were also evaluated for the top 50, 500, and 5000 proteins (indicated with the su x @50, @500, or @5000). Additionally, to compare classi cation performance of the learn to rank model with the combined EffectorP and secretion prediction decisions, a decision threshold of 0 was set for the learn to rank model (with > 0 indicating an effector prediction), and the classi cation metrics precision (the proportion of predicted effectors that are labelled as true effectors), recall (the proportion of known effectors that are predicted to be effectors), accuracy (the fraction of correct predictions), balanced accuracy (the arithmetic mean of precision and recall for binary cases like this, and is less affected by unbalanced data-sets than accuracy), F1 score (the harmonic mean of precision and recall), and matthews correlation coe cient (MCC). For unbalanced datasets like the training set of effectors and non-effectors, MCC is considered a more reliable indicator of model performance than the other methods mentioned above 48 . Additionally, to evaluate the performance at different decision thresholds, the precision, recall, and MCC were calculated for 100 score thresholds along the range of each score, and the receiver operating characteristic (ROC) curves were plotted.
For the effector ranking scores, only known effectors were used as the relevant (positive) set with the irrelevant (negative or unlabelled) set consisting of secreted, non-secreted, and proteomes. Because EffectorP is intended to be run on secreted datasets, ranking statistics were only calculated for the subset of proteins that were predicted to have a signal peptide (by any method) and with fewer than two predicted TM domains (by either Phobius or TMHMM), and classi cation statistics were considered on both this secreted subset, and as a combined classi er (secretion and EffectorP prediction) on the whole datasets. For the secretion ranking score the positive set consisted of the known effectors and SwissProt secreted set, and the negative set was made of the SwissProt non-secreted proteins.   and "non-secreted" proteins are manually annotated proteins from the SwissProt database. Proteomes consist of the complete predicted proteomes from 10 well studied fungi (Supplementary Table 2). The number of proteins represented by each violin are indicated on the x-axis. Comparing the scores of Predector with EffectorP versions 1 and 2 for proteins in the testing dataset. Scatter plots in the lower-left corner indicate comparisons of predictive scores between methods, with predicted secreted proteins (any signal peptide and fewer than two TM domains predicted) indicated in yellow, and non-secreted proteins indicated in blue. Density plots along the diagonal indicate distributions of the full test dataset versus predictive scores for each method (indicated along the x-axis), also coloured by secretion prediction as before (Note: there are far more non-secreted than secreted proteins in