As an integrated application of modern information technologies and artificial intelligent, Prognostic and Health Management (PHM) is important for machine health monitoring. Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry. In this paper, a multi-scale Convolutional Gated Recurrent Unit network (MCGRU) is proposed to address raw sensory data for tool wear prediction. At the bottom of MCGRU, six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network, which augments the adaptability to features of different time scales. These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations. At the top of the MCGRU, a fully connected layer and a regression layer are built for cutting tool wear prediction. Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.