Applying deep neural network models to robot-arm grasping tasks requires the laborious and time-consuming annotation of a large number of representative examples in the training process. Accordingly, this work proposes a two-stage grasping model, in which the first stage employs learning-based template matching (LTM) algorithm for estimating the object position, and a self-rotation learning (SRL) network is then proposed to estimate the rotation angle of the grasping objects in the second stage. The LTM algorithm measures similarity between the feature maps of the search and template images which are extracted by a pre-trained model. While the SRL network performs the automatic rotation and labelling of the input data for training purposes. Therefore, the proposed model does not consume an expensive human-annotation process. The experimental results show that the proposed model obtains 92.6% when testing on 2400 pairs of the template and target images. Moreover, in performing practical grasping tasks on a NVidia Jetson TX2 developer kit, the proposed model achieves a higher accuracy (88.5%) than other grasping approaches on a split of Cornell-grasp dataset.