The web tool described in this paper [26] operates in two modes that broadly represents translation or transcription. The audio display is generated using algorithms based on biological rules to generate sound at the play-head. The play-head substitutes for a ribosome during translations mode or the RNA replicase/ polymerase during transcription mode. A complex auditory stream was generated by overlaying up to 12 layers of audio (as summarised in Table 1).. Each layer of audio is derived from a RNA motif directly or metadata was used to flag the region of sequence to be sonified.
Table 1. The mapping of each RNA feature into a layer of the auditory display
Description
|
RNA feature
|
Note range
|
When is the feature sonified
|
As the sequence is processed each is sonified to create a constant audio stream
|
Nucleotide
|
4
|
Throughout the genome
|
Di-nucleotide
|
16
|
Throughout the genome
|
GC Content (10 bp)
|
10
|
Throughout the genome
|
GC Content (100 bp)
|
10
|
Throughout the genome
|
Three of the same nucleotide repeats
|
Example: the poly-A tail
|
4
|
Anytime when condition is true
|
Codons (translation only)
|
Codon Frame 1
|
20
|
Between start and stop codons
|
Codon Frame 2
|
20
|
Between start and stop codons
|
Codon Frame 3
|
20
|
Between start and stop codons
|
Trinucleotides (transcription only)
|
Only the 1st and 3rd nucleotides are considered
|
16
|
Throughout the genome
|
Untranslated regions
|
Intragenic UTR regions (excluding 5` and 3` UTRs)
|
16
|
At genomic regions defined by GenBank metadata [2]
Individual nucleotides were mapped to higher octaves ranges for the sake of audio clarity.
|
Transcription regulating sequences (TRS)
|
Each nucleotide in TRS1 through to TRS10
|
16
|
Polyprotein cleavage sites (translation only)
|
Nucleotides that code for the cleaved AA residues
|
4
|
Stem and Loop regions (SL)
|
Each nucleotide in the identified region
|
16
|
The most fundamental building block of RNA is an individual nucleotide and these were sonified as one of four individual notes whereas di-nucleotides were sonified as one of 16 notes and together these were panned left and right in the auditory display. Another characteristic of nucleic acid sequences which is often used as a metric of genome status is the GC content which is often represented as a ratio. Typically in Coronavirus the count of U is above average and C is below average whereas A is preferred over G [27] leading to a relatively low GC ratio. In our approach two GC ratio were determined within a sliding window of 10 or 100 nucleotides respectively across the entire genome. Each time the GC ratio changed by an increment or decrement of 0.1 a note was generated and these were panned against each other in stereo. When there is no change between two adjacent features in an audio stream, the first instance of the feature was allowed to play for a longer period of time rather than generating another instance of the same note. This approach provides a brief pause in the audio layer and provides an opportunity for another layer to be distinguished in the auditory display. Together these four audio tracks create an output that can be heard across the entire auditory display of the genome. These RNA features are not specific to either transcription or translation nor are they specific to a particular region of the genome. Other sonified genome features were layered over this sonified landscape.
In the translation mode, codons represent an important feature of RNA and these were sonified as 20 notes when representing translation into amino acids. No distinction was made between the start methionine or or that which occurs in the body of the peptide sequence. Additionally, stop codons were sonified as an additional note since these are highly significant in the function of the genome. Overlapping codons in each of the three reading frames were sonified during translation to detect ORF’s in either frame. An important consideration in the modelling of translation was to use the start and stop codons in each reading frame to trigger or holt the audio derived from other codons. Additionally in the visual display only the sonified codons were shown. Using this simple method all gene sequences reported in the GenBank metadata were accurately represented in both the audio and visual displays. Additionally all open reading frames throughout the RNA genome are shown and sonified. Only open reading frames that correlated with the known metadata were labelled in the visual display. This is consistent with prior approaches of mapping either individual bases [28], codons [29] or amino acids [30, 31] to musical notes in a manner inspired by the genetic code or codon usage during translation.
In the displays representing transcription, codons per se were not considered. Instead tri-nucleotide features were considered for sonification, however, these were considered to be positioned adjacent in the sequence rather than overlapping. Given that there are 64 different tri-nucleotides it is not possible to use a traditional scale. A traditional piano consists of 7 octaves plus a minor third (88 notes). Given that there 7 scale notes in a octave it would require over 9 octaves to accommodate 64 trinucleotides. Using synthesised notes could overcome this limitation but this would entail playing shrill high pitched notes that would be grating to the ear. Therefore linear mapping of 64 codons to individual notes was avoided. In the transcription display, tri-nucleotides were mapped to 16 individual notes since only the first and third position in each was considered. Since trinucleotides play no functional role in the process of transcription there was no loss of information content using this approach and the audio could be designed to complement the single nucleotides and di-nucleotides in the audio stream and avoid the mapping to shrill notes. Additionally tri-nucleotides were not mapped to start or stop functionality and these are audible throughout the entire display across the genome. Their occurrence had no further effect in the auditory display.
Metadata specific to the Coronavirus SARS-CoV–2 sequence was used to supplement the audio generated from the intrinsic characteristics of the RNA sequence. Audio from un-translated sequences between the open reading frames were were also mapped to an audio stream so that they were more clearly distinguished from the coding regions. Additionally the viral genome is known to contain 10 transcriptional regulatory sequences (TRS). The conserved core of this motif was sonified and since these often occurred in the untranslated regions the audio from these two were panned in stereo. The RNA sequence also contains five known stem loop structures known to play a role in the function of the genome [32] and their occurrence was sonified. The genome codes for a large polyprotein from a large open reading frame. This polyprotein is thought to be cleaved into 16 individual polypeptide and the occurrence of the known cleavage sites was sonified. In addition to generating a short burst of notes, cleavage regions were also used to pause the progression of the play-head for a second or so. This effectively highlights the transition from one NSP sequence to the next. The occurrence of three or more similar nucleotides was sonified since these are easy to detect by eye and their sound may to help the user keep track of where they are in the display.
Audio generated from each of these sequence motifs and metadata were combined to create a complex auditory display to represent either transcription or translation. As the audio is played a sliding window of 60 nucleotides is shown on the screen. At any point in time the first nucleotide in the visual play-head can be heard in the auditory display. Other sequence features are determined relative to the position of this nucleotide.
To play the entire genome takes approximately 93 minutes translation mode which corresponds to approximately five nucleotides per second. This is slower than cellular translation which is thought to proceed at approximately 30 base pairs per second [33], however to play this any faster makes it difficult to interpret since there is a short duration between each note and a different algorithm would need to be devised. In transcription mode the full display lasts 120 minutes the number of base pairs per second is a little slower simply to clearly distinguish it from translation mode. Interactive buttons (summarised in Table 2) have been provided for each sonified feature so that each can be played directly, for example a gene sequence or TRS can be selected and played without having to play through the proceeding sequence data.
Table 2. Description of the navigation buttons from where users can begin playing the audio and visual displays
Button set 1
|
RNA features associated with coding regions
|
5`UTR
|
5` untranslated region
|
Poly-/-Protein
|
Two buttons representing the coding region before and after the -1 frameshift position of the large polyprotein
|
9 U regions
|
Each navigates to an untranslated region between ORF’s
|
-S-
|
Region coding for the canonical S protein
|
-E-
|
Region coding for the canonical E protein
|
-M-
|
Region coding for the canonical M protein
|
-N-
|
Region coding for the canonical N protein
|
ORF 3a, ORF 6, ORF 7a, ORF 7b, ORF 8, ORF 10
|
Regions thought to code for other proteins or polypeptides
|
3`UTR
|
5` untranslated region
|
|
|
Button set 2
|
RNA features associated with the NSP proteins
|
5`UTR
|
5` untranslated region
|
N1 - N16
|
Location of the 16 NSP proteins within the large polyprotein
|
14 C sites
|
Cleavage sites within the translated polyprotein giving rise to the 16 individual NSP proteins
|
S - ORF 10
|
Region of the RNA sequence downstream of the polyprotein
|
3`UTR
|
5` untranslated region
|
|
|
Button set 3
|
RNA features associated with the TRS regions
|
5`UTR
|
5` untranslated region
|
T1 - T10
|
Location of TRS 1 to TRS 10. TRS1 is sometimes referred to as the leader TRS and is linked to the subsequence TRS 2 - 10 to produce the sub-genomic regions during transcription.
|
5 SL regions
|
Stem Loop regions giving rise to structured regions of RNA. These are formed due to sequence complementarity and base pairing.
|
12 Seq regions
|
Undefined sequences between the TRS regions, these often correspond closely to the ORF regions
|
3`UTR
|
5` untranslated region
|
In this study, auditory streams were paired and played as stereo layers. Audio that plays consistently throughout the entire genome were player at low frequency and transient data was highlighted at a higher frequency register to make them more prominent. In addition to simply considering the basic construction of pitch and separation, the data was harmonised to make it more listenable. The root harmony and third notes of the scale were emphasised with the limited 4 note mono-nucleotide sonification to establish a strong harmonic tone throughout the playback. The drone generated from the GC content (which is sometimes invariant for periods of time) was also used to reinforce the foundation of the basic scale harmony. The G or C bases, as nucleotide, di-nucleotide or trinucleotides were each matched to higher octaves and A and U were mapped to lower octaves. This was done consistently between these audio streams in an attempt to harmonise the otherwise random note selection based on sequence information. An exception to this principle was made for start and stop codons which were mapped to higher pitches then GC rich codons so that their occurrence was easily perceived in the auditory display (since higher pitched notes are perceived to be louder). Given that these codons are used to trigger and halt individual audio streams this approach further emphasises the occurrence of an open reading frame.
The wider note range of the codons (20 notes) were used to introduce leading tones that often sound more dissonant and add complexity to the harmonic spectrum. This allows them to be easily discerned above the background tones of the simpler motifs. Lastly, less frequent audio from dispersed regions of the genome were pitched at the highest octave ranges or more dissonant notes within the diatonic scale to provide clarity. All of this was done within a modes of the diatonic major scale. Translation was played in Bb Aeolian (Bb, C, Db, Eb, F, Gb, Ab) whereas transcription was played in C Lydian (C, D, E, F#, G, A, B). The parameters for mapping of each RNA feature into a audio stream are summarised in Table 3. These choice are arbitrary and in later iterations of the tool it may be possible to choose the scale modes and key of choice. The Ionian mode mode of the major scale was avoided since this is generally considered to be happy sounding and inappropriate for the data.
Each nucleotide generates a note on every beat whereas each di-nucleotide generates a note every second beat. Each codon (in an ORF) generates a not every third beat. Together these notes are syncopated to create a characteristic sound during peptide translation that is distinct from the surrounding untranslated region. Audio from the GC track is only triggered when the GC ratio changes by an increment of 0.1. If a note sequence has repeated notes then the length of the note is extended rather than being repeated. This creates space and clarity for other notes layered in the auditory display.
Table 3. Scale degrees and instrumentation of the RNA features being sonified
|
|
|
Translation
Scale Bb aeolian mode
|
Transcription
Scale C Lydian mode
|
Sonified motif
|
Instrument
|
Pan
|
Scale degrees
|
Octave
|
Scale degrees
|
Octave
|
Nucleotide
|
Synth
|
L
|
1, 3
|
2, 3
|
1, 5
|
2, 3
|
Di-nucleotide
|
Synth
|
R
|
1, 4, 5, 6
|
1, 2, 3, 4
|
1, 3, 5
|
1
|
GC Content (10 bp)
|
AM synth + delay
|
L
|
1, 3, 6, 7
|
2,3
|
1, 3, 5, 7
|
4, 5
|
GC Content (100 bp)
|
AM synth + delay
|
R
|
1, 3, 6, 7
|
2, 3
|
1, 3, 5, 7
|
4, 5
|
3 bp repeat
|
Synth
|
L
|
1, 3
|
4
|
1, 4, 5
|
6
|
Codon Frame 1 (translation)
|
FM synth
|
L
|
1, 3, 4, 5, 7
|
2
|
-
|
-
|
Codon Frame 2 (translation)
|
FM synth
|
C
|
1, 3, 4, 5, 7
|
2
|
-
|
-
|
Codon Frame 3 (translation)
|
FM synth
|
R
|
1, 3, 4, 5, 7
|
2
|
-
|
-
|
Tri-nucleotide (transcription)
|
FM synth
|
L
|
-
|
-
|
1, 3, 4, 5, 7
|
3
|
Untranslated regions
|
AM synth
|
R
|
1, 2, 3
|
5
|
1, 4, 6, 7
|
3
|
Transcription regulating sequences (TRS)
|
AM synth
|
L
|
1, 2, 4, 5, 6
|
5
|
1, 2, 3, 4, 5, 6, 7
|
6
|
Cleavage sites in the polyprotein
|
AM synth + distortion
|
L
|
1, 6, 7
|
4
|
1, 2, 3
|
6
|
Stem and Loop regions (SL)
|
AM synth
|
R
|
1, 2, 6, 7
|
5
|
1, 4, 5, 7
|
|
Translation of the genomic RNA leads to expression of a large polyprotein following ribosome binding to the 5’ prime untranslated region. However, from this genomic template the subsequent genes downstream from the polyprotein cannot be directly expressed presumably due to a stop codon at the end of the gene. In the display the sonification also stops at this point, however, play can be resumed to inspect the downstream sequence.
One of the more interesting characteristics of the viral genome is the phenomena of discontinuous transcription whereby a template switch occurs during the synthesis of sub-genomic negative-strand RNA’s [5]. Various mechanisms have been proposed to explain how the transcription regulatory sequences (TRS) are involved in the synthesis of positive strand sub-genomic RNA from various negative strand intermediates [34]. TRS sequences are located in the untranslated regions between the genes and one model suggests that these facilitate transcription skipping to the TRS sequence located in the 5’ untranslated region. This process is driven by complementary interactions between TRS regions to add a copy of the leader sequence to form sub-genomic RNA species. In these sub-genomic RNA’s the polyprotein sequence has been omitted and ribosome binding at 5’ end can read through and express the contiguous downstream gene sequence [35]. This functional behaviour of the RNA has been built into the auditory and visual display. By default the process of auditory translation runs from the 5’ end through to the stop code on at the end of the polyprotein, whereas transcription runs the full length of the RNA beginning at the 3’ end. A toggle switch has been implemented to change these behaviours. When the toggle switch is selected during the transcription mode, the play-head will skip from any upcoming TRS region to the leader TRS1 located in the 5’ region (mimicking the behaviour of the RNA replicase). Subsequently in translation mode with the toggle activated the play-head will, by way of example, skip from the leader TRS1 (omitting the polyprotein) through to the TRS2 region adjacent to the start of the S protein. Whilst the metadata use to drive this behaviour does not change the characteristics of the sound, it does change the selection which regions are sonified.
The website does not rely on a server and instead the entire RNA sequence is downloaded into the client browser when the page is loaded. All code is written in JavaScript and runs within the client browser. The React framework was used to create the environment state whereby each iteration of state represents a sliding window to the next base pair. Redux was also used to help manage state. Audio is generated in real time within the client browser using Tone.js. The Reactronica framework [36] was used to further manage audio within the environment state.
Translation of the viral polyprotein is known to be subject to a frameshift mutation and since this does not follow the normal rules of gene expression a conditional expression was used to change the display for that instance so that the translated protein shifts frame 2 to frame 1 in both the visual and auditory display.