Human joint motion exhibits a high degree of freedom, with different joints capable of moving and rotating in various directions. Consequently, accurately capturing the features of posture motion becomes challenging, resulting in lower prediction accuracy for human joint motion. To address this issue, this paper proposes a novel method for predicting and verifying human motion based on joints using AVI video conversion. The foreground of human motion images in AVI videos is extracted using a Gaussian background model, and the AVI format video is transformed into a 3D video by fusing the foreground and background images. The spatio-temporal weighted attitude motion features of the 3D video frames are extracted and utilized as input for a CNN algorithm. Motion feature vectorization is employed to reduce motion edge detection errors through a spatio-temporal weighted adaptive interpolation method. Subsequently, the motion basis is generated after processing the fusion of attitude edge features. The particle filter algorithm is utilized to establish the human joint motion model, and joint-based motion prediction is conducted based on the motion basis. Experimental results demonstrate that the 3D conversion enhances the background depth of the 2-dimensional AVI video. Additionally, the proposed method extracts motion bases with clear performance, accurate actions, smooth outlines, and non-redundant backgrounds. The prediction results of human movement based on joints exhibit accuracy, with the error in comparison to actual movement falling within a controllable range.