As the entire pipeline is based on multiple sequence alignments. Their quality is of great importance for functional assays. Therefore, the parameters of MAFFT (Nakamura, Yamada et al. 2018) were adjusted to precisely fulfill the alignment requirements in every alignment step. This is of particular importance when aligning the short primer sequences for visualization. In this case the `--addfragments` parameter of MAFFT is used to properly align the short primers to their origin. MAFFT also allows the automated adjustment of the strand direction of a sequence. Another important parameter is `--adjustdirectioin` that allows the automated detection and adjustment of the strand direction in which sequences are provided, as well as the mapping of the reverse primers.
To avoid unwanted distortions of the consensus score(s) due to overrepresented sequences, identical sequences are removed from the alignment, unless specified differently using the `--keepduplicates` parameter. The `--consensussimilarity` parameter defines the similarity cutoff for each sequence in comparison to the consensus sequence of the alignment. The default value of, e.g., 0.8 means that a sequence has to have at least 80% of its aligned nucleotides in common with the consensus sequence. Otherwise the sequence is removed from the alignment for primer prediction. To identify ideal consensus oligos, the consensus sequence is re-calculated on the filtered alignment. Therefore, the pipeline uses MAFFT to align all input sequences together in a global multiple sequence alignment and it identifies for every position in the alignment the most common nucleotide. In addition, a consensus score is calculated for every alignment position which is the ratio of the respective count/number of most common nucleotide or gap symbol (-) at that position to the total number of sequences. All letters that are not ATGC are treated as gap. A perfectly conserved region in which all sequences at a given position are identical is thus assigned a consensus score of 1. The pipeline allows the user to control the quality values of the consensus sequence used for primer prediction via the `‑‑consensusthreshold` parameter. The default value of 0.95 ensures that the most abundant nucleotide occurs in at least 95% of the sequences at the given position. In addition, the regions above the threshold must have a contiguous minimum length of at least 20 nucleotides. All regions that fall below these values are excluded from the subsequent primer prediction.
Before the consensus regions are identified for primer design, any gaps are removed from the consensus sequence as well as the corresponding value from the consensus scores. This is necessary because gaps are not encoded by nucleotides and are therefore not relevant for primer design. Gaps in the consensus sequence are caused by insertions in one or more related sequences of the alignment for better visualization.
From this “gapless consensus sequence” the regions relevant for primer design are identified.
As Primer3 (Untergasser, Cutcutache et al. 2012) searches for the primer pair in a contiguous sequence section, instead of using the area in which primers are to be searched (SEQUENCE_INCLUDED_REGION and SEQUENCE_INTERNAL_INCLUDED_REGION), all areas in which primers are not to be searched are excluded by the pipeline (SEQUENCE_EXCLUDED_REGION and SEQUENCE_INTERNAL_EXCLUDED_REGION). This allows the prediction of primers in non-consecutive sequence segments. Furthermore, the gapless consensus sequence is automatically written into the primer3 parameter file (SEQUENCE_TEMPLATE). All other parameters such as melting temperatures or primer lengths are taken from the user-defined Primer3 parameter file. (For a detailed parameter description check out the primer3 manual (https://primer3.org/manual.html). From this, Primer3 predicts the optimal consensus primers and displays the results in a plain text file. The ConsensusPrime pipeline reads the Primer3 output and creates a comprehensive output including all details of previous filter steps in .html format. The predicted primers are added to a final alignment to be visualized using ClustalX (Larkin, Blackshields et al. 2007) and can be manually inspected for logical errors. Note that the reverse primer is added to the final alignment as the reverse complementary sequence.