Spatially resolved transcriptomics (SRT) data provide critical insights into gene expression patterns within tissue contexts, necessitating effective methods for identifying spatial domains. Traditional clustering techniques often overlook spatial information, leading to disjointed domains. Current computational approaches integrate spatial information but still face challenges in recognizing domain boundaries, scalability, and the need of independent clustering steps. We introduce stDyer, an end-to-end deep learning framework designed for spatial domain clustering in SRT data. stDyer combines a Gaussian Mixture Variational AutoEncoder (GMVAE) with graph attention networks (GATs) to simultaneously learn deep representations and perform clustering for units. A unique feature of stDyer is the dynamic graphs it adopts, which adaptively links units based on Gaussian Mixture assignments in the latent space, thereby improving spatial domain clustering and producing smoother domain boundaries. Additionally, stDyer's mini-batch neighbor sampling strategy facilitates scalability to large datasets and enables multi-GPU training. Benchmarking against state-of-the-art tools across various SRT technologies, stDyer demonstrates superior performance in spatial domain clustering, multi-slice analysis, and large-scale dataset handling.