The results of “true reads” and “others” showed that the baseline contamination in our lab is around 10% to 20%, but an accidental handling can increase the number to almost 40%. Allio et al.(2020) found very low cross-contamination (0.26%) in general in their genome shotgun sequencing data of swallowtail butterfly, but one sample, Parnassius imperator had a significant higher value, 26.71% of the contigs were contaminated, suggesting that human errors can have sporadic but severe impact on high-throughput sequencing data.
We found that the “others” had similar composition between the two experiments. If we disregard the sample 2 and 3 of the eDNA experiment, the composition of the “others”, ranked from high to low was “inline index mutation”, the “unknown”, “index contamination” and “cross-contamination”.
Because we applied a substitution distance of three base pairs, it should be safe to assign the reads with one base-pair mutation in their inline indices back to the “correct” sample. Indeed, we found few inline index had two substitutions. The index mutation could be caused by two reasons. One is sequencing error, which should be minimal because of the low error rate in sequencing the first six base pairs. The other error could happen while synthesizing the inline indices, so we should test the accuracy of synthesized indices from different provider in future.
The results of contaminated reads showed that the source of contamination was from sample 10, 7 and 1 in the eDNA experiment and from sample 8, 7 and 1 in the chondrichthyan study. The degree of cross-contamination also was not uniform. For example, sample 2 and 3 of the eDNA experiment had 24.93% and 27.01% contamination mainly from sample 10, suggesting handling errors may be involved. The route of cross-contamination may be through tube lids, gloves, pipette tips or aerosol.
The eDNA sample 7 had 12.1% inline index contamination, and mostly from one pair of inline indices, suggesting those may be contaminated by other library, prepared in our lab. We suggest that the index should be synthesized once at a time, so inline index contamination could be reduced. Extreme care should be taken when making the adapters in the lab, and the adapters should be divided into small aliquot for storage.
There were at least one unknown “inline index” sequenced for those reads. Our lab also has been using classic protocol of library prep , which has no inline index applied. We suspect those libraries that made with the classic protocol may have polluted common reagents, which in turn contaminated the samples of this project. Therefore, the first 6 bp base pairs of the reads cannot match any inline indices.
In the “others”, reads with one mutation in the index can be corrected into its original sample, but reads with mixed inline index or unknown index sequence cannot be assigned to any sample and should be excluded from further analyses.
Most analytical approaches are based on data filtering. For example, Croco (Simion, 2018)  is a database independent method that can be used to trace cross-contamination from divergent specie, but difference between closely related organisms cannot be recognized. ConFindr (Low, 2019)  identifies contaminated samples, if it contains more than one allele of core single-copy ribosomal protein genes. Dickins et al. (2014)  proposed a two-part pipeline to identify the contaminated samples based on unexpected number of variants and a phylogenetics approach.
Inline-index method, however, is independent on the sequence of samples, so there is no requirement on the similarity between samples to deduce the composition of the reads. The reads can be recognized using inline index even the sample contain DNA from unknown species, such as the eDNA data, but the analytical methods may not work in this situation. Kircher et al. (2012)  designed a double-index technique to detect jumping PCR by adding indices to both P5 and P7 of Illumina adapters. Peyrégne et al. (2020)  implied that the double-index method can be used to monitor contamination. However, the double-index are added before sequencing, there are a lot of chance that contamination can happened before that. Rohland et al. (2015)  invented the inline-index method that adding a pair of unique barcode to both end of the DNA insert to trace the contamination. Because they aimed at ancient DNA research, and paid most of their attention to the effects of unique barcode to damage rate, but did not test whether the inline-index approach can be used to mitigate problems of cross-contamination. Our research results showed that contamination occurred ubiquitously and the unique inline barcode can be used to trace the source of read contamination.
Furthermore, because samples are labeled with inline-index the before sending them to sequencing facility, cross-contamination resulted in sequencing center can be controlled. Cross-contaminated reads from other samples can be assigned back to their origin based on their inline-index pair to rescue the data. Finally, with 24 pairs of inline indices, plus the P7 index, more samples can be multiplexed and sequenced in the same sequencing lane to save the cost.