I. SmithWaterman Algorithm
The SmithWaterman algorithm is an unwinding recursion method for identifying the best local alignment between two sequences. It is a more sophisticated version of the NeedlemanWunsch algorithm, which aligns the entire length of two sequences[1]. The SmithWaterman algorithm is handy for analyzing biological sequences, such as DNA or protein sequences, where mutations and insertions/deletions are common[2].
II. Scoring Matrix Calculation:
The SmithWaterman algorithm initiates by constructing a scoring matrix that represents the similarity scores for all possible alignments of two sequences[3]. The algorithm iteratively fills the matrix by comparing pairs of characters from the sequences and adding up scores based on the match/mismatch, insertion, or deletion.
The scoring matrix is calculated using the following. equation:
where:
S(i, j) is the score at position(i, j) in the scoring matrix.
H(i, j) is the score at position (i, j) in the scoring matrix, excluding the current pair of characters.
M is the match score (positive for a match[1], negative for a mismatch[1]).
d is the gap penalty (negative for a gap).
The max function selects the maximum of the three possible scores: the score from matching the current pair of characters, the score from extending a gap in the first sequence, and the score from extending a gap in the second sequence[3]. The 0 term is used to prevent negative scores from accumulating in the matrix.
Traceback:
Once the scoring matrix is filled, the next step is to perform traceback to identify the best local alignment[3]. Traceback starts from the highest score in the matrix and follows the path through the matrix that maximizes the score at each step. The traceback path represents the aligned sequences with gaps and mismatches.
Alignment Generation:
The final step is to generate the aligned sequences based on the traceback path. This involves inserting gaps () into the sequences where necessary. The aligned sequences represent the best local alignment identified by the SmithWaterman algorithm.[2]
Example:
Consider the following two sequences:
Sequence 1: ACGT
Sequence 2: AGTC
The SmithWaterman algorithm would calculate the following scoring matrix:
Table 1. Matrix Calculation


A 
C 
G 
T 

0 
0 
0 
0 
0 
A 
0 
2 
1 
0 
0 
G 
0 
1 
1 
3 
2 
T 
0 
0 
0 
2 
5 
C 
0 
0 
2 
1 
4 
The highest score in the matrix is 5, located at position (4, 5). Traceback from this position would result in the following aligned sequences:
Sequence 1: ACGT_
Sequence 2: A_GTC
This represents the best local alignment between the two sequences, with a similarity score of 5.[12]
III. Implementation
The Banded SmithWaterman algorithm is implemented on an FPGA. In this algorithm, the scores in a band about the diagonal are computed. This is based on the premise that the optimal alignment may lie close to the diagonal. [14]
The hardware realization employed a systolic array architecture to perform the scoring and traceback phases. Independent computations along the antidiagonal line were executed concurrently using B Processing Elements (PEs).
The architecture, specifically designed for 4 PEs, is depicted in Fig (1). The scoring phase is categorized into three regions :
Top corner region, bottom corner region, and middle band extension region. In the top and bottomcorner regions, specific PEs are active which undergo continuous rightward shifts on the matrix.[10]
In the middle band extension region, all PEs are active, and there are alternating rightward and downward shifts. A rightward (downward) shift is implemented by shifting only the
R(Q) shift register, 1 leaving the Q(R) inputs to the PEs unchanged. This is illustrated in Fig(2).
In the architecture, B cycles are needed to initialize the Q register. (B − 1) rightward shifts are performed at the beginning followed by 2(L − B) alternate downward along with rightward shifts in the middle, and (B − 1) rightward shifts at the end. After calculating each PE's entry in the traceback matrix, the maximum score is represented by a 3bit relative position that is saved in an intermediate traceback memory.
For the next aligned entry, the entries must explicitly point to the PE and its memory address because each PE has a unique traceback memory. Considering concurrent calculations, this encoding approach mirrors the data dependency scheme in the three zones of operation.
Owing to concurrent calculations, the outcomes of previous cycles' computations of other PEs are all that are needed for a PE. The start traceback signal is used to start the traceback module once the traceback memory is full.
The module gets data from the traceback memory and performs traceback by sequentially shifting to locations indicated by the current relative position until a null 1 pointer is detected. This process starts at the bottomright cell. The traceback module outputs an aligned base pair for R and Q and is loaded into a register array at each cycle. Fig.1 presents the traceback module's construction.[8]
With a twocycle buffer between scoring and traceback, the traceback takes L cycles, or the alignment's length. C = B 1
+ 2L + Lal cycles are required to formulate the result, which is an addition of the previous cycle counts. This determines the score for appropriate inputs. The accelerator transmits the ready signal, instructing the core to receive the results, when the traceback module's activities are complete. Verilog HDL was used to implement the complete architecture, especially for the cases when L = 8 and B = 4.[3].