This paper proposes an innovative Spatiotemporal Variational Autoencoder (ST-VAE) model, aimed at effectively capturing the spatiotemporal features of pedestrian behavior to enhance the accuracy of pedestrian trajectory prediction at intersections. To address the complex relationships in pedestrian social interactions, the paper designs a Social Graph Attention Network (SGAT) that dynamically identifies the relative importance of traffic participants, optimizing interaction analysis. Additionally, the paper introduces a Complex Gated Recurrent Unit (CGRU) to accurately capture the dynamic changes in time series data, thereby optimizing future trajectory predictions.To eliminate bias and improve prediction accuracy, the paper designs a Diversity-Enhanced Final Position Clustering (DEFPC) method as a post-processing technique. Evaluation results on four public datasets show that the ST-VAE model reduces the Average Displacement Error (ADE) and Final Displacement Error (FDE) by 27.2% and 18.2%, respectively, compared to the current best benchmark model SocialCVAE, demonstrating significant performance advantages. Further ablation studies reveal the specific contributions of each component to the model's performance, confirming the effectiveness and robustness of ST-VAE in handling diverse environments and complex interaction scenarios.Overall, the ST-VAE model not only provides new perspectives for pedestrian trajectory prediction technology but also offers solid technical support for the development of autonomous driving planning systems, showcasing the immense potential of deep learning in intelligent transportation system applications. Through innovative model design and comprehensive experimental validation, this paper provides new methodologies for efficient pedestrian trajectory prediction, advancing the development of the intelligent transportation field.