Genome assembly is one of the most relevant and computationally complex tasks in genomics projects. It aims to reconstruct a genome through the analysis of several small textual fragments of such genome — named reads. Ideally, besides ignoring any errors contained in reads, the reconstructed genome should also optimally combine these reads, thus reaching the original genome. The quality of the genome assembly is relevant because the more reliable the genomes, the more accurate the understanding of the characteristics and functions of living beings, and it allows generating many positive impacts on society, including the prevention and treatment of diseases. The assembly becomes even more complex (and it is termed de novo in this case) when the assembler software is not supplied with a similar genome to be used as a reference. Current assemblers have predominantly used heuristic strategies on computational graphs. Despite being widely used in genomics projects, there is still no irrefutably best assembler for any genome, and the proper choice of these assemblers and their configurations depends on Bioinformatics experts. The use of reinforcement learning has proven to be very promising for solving complex activities without human supervision during their learning process. However, their successful applications are predominantly focused on fictional and entertainment problems-such as games. Based on the above, this work aims to shed light on the application of reinforcement learning to solve this relevant real-world problem, the genome assembly. By expanding the only approach found in the literature that addresses this problem, we carefully explored the aspects of intelligent agent learning, performed by the Q-learning algorithm, to understand its suitability to be applied in scenarios whose characteristics are more similar to those faced by real genome projects. The improvements proposed here include changing the previously proposed reward system and including state space exploration optimization strategies based on dynamic pruning and mutual collaboration with evolutionary computing. These investigations were tried on 23 new environments with larger inputs than those used previously. All these environments are freely available on the internet for the evolution of this research by the scientific community. The results suggest consistent performance progress using the proposed improvements, however, they also demonstrate the limitations of them, especially related to the high dimensionality of state and action spaces. We also present, later, the paths that can be traced to tackle genome assembly efficiently in real scenarios considering recent, successfully reinforcement learning applications — including deep reinforcement learning — from other domains dealing with high-dimensional inputs.