The accuracy of the predicted value of specific surface area plays a significant guiding function in the production and scheduling of cement, as it is a major index impacting the quality of cement. However, due to data characteristics in the cement grinding process, such as time delay, strong coupling and nonlinearity, and the traditional prediction model being a static small sample feature acquisition method based on time correlation, problems such as poor data feature representation and low accuracy exist. In order to achieve an accurate prediction of cement specific surface area, we propose a deep learning prediction model based on Dual Temporal Extraction Network (DTENet). This network is divided into two parts: encoder and decoder. The network uses a dual temporal feature extraction mechanism in the encoder part. Different from the LSTM network which only focuses on the correlation information of a single time step, we realize the mechanism of temporal feature extraction in the short term and long term respectively by constructing a two-stage sliding window to send the data into different temporal feature extraction networks. This network can greatly improve the prediction accuracy for cement specific surface area data with large time lag, redundancy and variable working conditions. In the decoder part, we use the channel attention approach to enhance the spatial information extraction capability of the model. The result of the experiments shows that our model has superior accuracy in cement specific surface area prediction when compared to LSTM, XGBoost and ARIMA models.