We develop the machine learning capability to predict a time sequence of in-situ transmission electron microscopy (TEM) video frames based on the combined long-short-term-memory (LSTM) algorithm and the features de-entanglement method. We train deep learning models to predict a sequence of future video frames based on the input of a sequence of previous frames. This unique capability provides insight into size dependent structural changes in Au nanoparticles under dynamic reaction condition using in-situ environmental TEM data, informing models of morphological evolution and catalytic properties. The model performance and achieved accuracy of predictions are desirable based on, for scientific data characteristic, a limited size of training data sets. The model convergence and values for the loss function mean square error show dependence on the training strategy, and structural similarity measure between predicted structure images and ground truth reaches the value of about 0.7. This computed structural similarity is smaller than values obtained when the deep learning architecture is trained using much larger benchmark data sets, it is sufficient to show the structural transition of Au nanoparticles. While performance parameters of our model applied to scientific data fall short of those achieved for the non-scientific big data sets, we demonstrate model ability to predict the evolution, even including the particle structural phase transformation, of Au nano particles as catalyst for CO oxidation under the chemical reaction conditions. Using this approach, it may be possible to anticipate the next steps of a chemical reaction for emerging automated experimentation platforms.