1. Schmidt, B., Hildebrandt, A.: Next-generation sequencing: big data meets high performance computing. Drug Discovery Today (2017)
2. Terrizzano, I.G., Schwarz, P.M., Roth, M., Colino, J.E.: Data wrangling: The challenging yourney from the wild to the lake. In: CIDR (2015)
3. Mernik, M., Heering, J., Sloane, A.M.: When and how to develop domain-specific languages. ACM computing surveys (CSUR) 37(4), 316–344 (2005)
4. Dyer, R., Nguyen, H.A., Rajan, H., Nguyen, T.N.: Boa: Ultra-large-scale software repository and source-code mining. ACM Transactions on Software Engineering and Methodology (TOSEM) 25(1), 7 (2015)
5. Deus, H.F., Correa, M.C., Stanislaus, R., Miragaia, M., Maass, W., De Lencastre, H., Fox, R., Almeida, J.S.: S3ql: A distributed domain specific language for controlled semantic integration of life sciences data. BMC bioinformatics 12(1), 285 (2011)
6. Prlic´, A., Yates, A., Bliven, S.E., Rose, P.W., Jacobsen, J., Troshin, P.V., Chapman, M., Gao, J., Koh, C.H., Foisy, S., et al.: Biojava: an open-source framework for bioinformatics in 2012. Bioinformatics 28(20), 2693–2695 (2012)
7. Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G., Korf, I., Lapp, H., et al.: The bioperl toolkit: Perl modules for the life sciences. Genome research 12(10), 1611–1618 (2002)
8. Cock, P.J., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., et al.: Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422–1423 (2009)
9. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
10. Hadoop and MongoDB. https://www.mongodb.com/hadoop-and-mongodb
11. Genomics England. https://www.genomicsengland.co.uk/
12. Turnbull, C., Scott, R.H., Thomas, E., Jones, L., Murugaesu, N., Pretty, F.B., Halai, D., Baple, E., Craig, C., Hamblin, A., et al.: The 100000 genomes project: Bringing whole genome sequencing to the nhs. BMJ: British Medical Journal (Online) 361 (2018)
13. Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11 Suppl 12, 1 (2010)
14. Mahadik, K., Wright, C., Zhang, J., Kulkarni, M., Bagchi, S., Chaterji, S.: Sarvavid: A domain specific language for developing scalable computational genomics applications. In: Proceedings of the 2016 International Conference on Supercomputing. ICS ’16, pp. 34–13412. ACM, New York, NY, USA (2016). doi:10.1145/2925426.2926283. http://doi.acm.org/10.1145/2925426.2926283
15. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of molecular biology 215(3), 403–410 (1990)
16. Leo, S., Santoni, F., Zanetti, G.: Biodoop: bioinformatics on hadoop. In: Parallel Processing Workshops, 2009. ICPPW’09. International Conference On, pp. 415–422 (2009). IEEE
17. Niemenmaa, M., Kallio, A., Schumacher, A., Klemelä, P., Korpelainen, E., Heljanko, K.: Hadoop-bam: directly manipulating next generation sequencing data in the cloud. Bioinformatics 28(6), 876–877 (2012). doi:10.1093/bioinformatics/bts054
18. Sadasivam, G.S., Baktavatchalam, G.: A novel approach to multiple sequence alignment using hadoop data grids. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud. MDAC ’10, pp. 2–127. ACM, New York, NY, USA (2010). doi:10.1145/1779599.1779601. http://doi.acm.org/10.1145/1779599.1779601
19. Langmead, B., Hansen, K.D., Leek, J.T.: Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 11(8), 83 (2010)
20. Alnasir, J., Shanahan, H.: The application of hadoop in structural bioinformatics. BioRxiv, 376467 (2018)
21. Islam, M.J., Sharma, A., Rajan, H.: A cyberinfrastructure for big data transportation engineering. Journal of Big Data Analytics in Transportation (2019). doi:10.1007/s42421-019-00006-8
22. Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., Kasprzyk, A.: Biomart–biological queries made easy. BMC genomics 10(1), 22 (2009)
23. Drost, H.-G., Paszkowski, J.: Biomartr: genomic data retrieval with r. Bioinformatics 33(8), 1216–1217 (2017)
24. Koonin, E.V., Wolf, Y.I.: Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic acids research 36(21), 6688–6719 (2008)
25. Dede, E., Govindaraju, M., Gunter, D., Canon, R.S., Ramakrishnan, L.: Performance evaluation of a mongodb and hadoop platform for scientific data analysis. In: Proceedings of the 4th ACM Workshop on Scientific Cloud Computing, pp. 13–20 (2013). ACM
26. Chodorow, K.: MongoDB: the Definitive Guide: Powerful and Scalable Data Storage. " O’Reilly Media, Inc.", ??? (2013)
27. Pruitt, K.D., Tatusova, T., Maglott, D.R.: Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research 35(suppl_1), 61–65 (2006)
28. Rajan, H.: Bridging the digital divide in data science. In: SPLASH/SPLASH-I’17: The ACM SIGPLAN Conference on Systems, Programming, Languages and Applications: Software for Humanity (2017)
29. Generic Feature Format Version 3. http://gmod.org/wiki/GFF3