Given the great inconvenience caused by the randomness of the fault to the maintenance work, it is necessary to perform on-site and efficient disassembly planning for the faulty parts and present them in combination with virtual reality (VR) technology to achieve rapid repair. As a promising method in solving dynamistic and stochastic problems, deep reinforcement learning (DRL) is adopted in this paper for the solution of adaptive disassembly sequence planning (DSP) in the VR maintenance training system, in which sequences can be generated dynamically based on user inputs. Disassembly Petri net is established to describe and model the disassembly process, and then the DSP problem is defined as a Markov decision process (MDP) that can be solved by the deep Q-network (DQN). For handling the temporal credit assignment with sparse rewards, the long-term return in DQN is replaced with the fitness function of the genetic algorithm (GA). Meanwhile, the update method of gradient descent in DQN is adopted to speed up the iteration of the population in GA. A case study has been conducted to prove that the proposed method can provide better solutions for DSP problems in terms of VR maintenance training.