In the field of intelligent manufacturing, robot grasping and sorting is an important content. However, in the traditional 2D camera-based robotic arm grasping method, the grasping efficiency is low and the grasping accuracy is low when facing the scene of stacking and occlusion. Insufficiency and other issues, a dual perspective-based deep reinforcement learning promotion and capture method is proposed. In this case, a novel method of pushing-grasping collaborative based on the deep Q-network in dual viewpoints is proposed in this paper. This method adopts an improved deep Q-network algorithm, with an RGB-D camera to obtain the information of objects’ RGB images and point clouds from two viewpoints, and combines the pushing and grasping actions, so that the trained manipulator can make the scenes better for grasping, so that it can perform well in more complicated grasping scenes. What’s more, we improved the reward function of the deep Q-network and propose the piecewise reward function to speed up the convergence of the deep Q-network. We trained different models and tried different methods in the V-REP simulation environment, and it drew a conclusion that the method proposed in this paper converges quickly and the success rate of grasping objects in unstructured scenes raises up to 83.5%. Besides, it shows the generalization ability and well performance when novel objects appear in the scenes that the manipulator has never grasped before.