Convolutional neural networks (CNNs) exhibit similarities to the human visual cortex, such as in structure and response to images. However, when it comes to the learning process, CNN and visual neural network are quite different: CNNs are generally trained by complex mathematical analysis based gradient descent algorithms, while the human brain seems to learn how to recognize objects in a more intuitive way: extract and remember the features of the object. This difference raises an interesting question --- could CNN recognize objects in a more humanlike way? In this work, a straightforward but novel ''extract-remember'' model (ERM) is proposed: first, we observed that different image classes can stably trigger the activation of different neuron groups even in a randomly weighted CNN, which means that the activated neurons jointly represent the core features of the stimuli; then, the simple remember operation is involved: try to identify the commonly activated neuron group (CANG) related to a specific object during the training process, instead of adjusting the huge amount of values of the weights. After mapping different object classes into different CANGs, the CNN can directly recognize the object in a new image by determining whether its response contains the remembered CANGs, and recognition results on different image datasets show the significant efficiency of ERM. The proposed ERM has high interpretability, suggesting a new direction for artificial neuron network design, which may in turn, also help to understand the object recognition process of human brain.