The evolution of drug resistance leads to treatment failure and tumor progression. Intermittent androgen deprivation therapy (IADT) helps responsive cancer cells compete with resistant cancer cells in intratumoral competition. However, conventional IADT is population-based and ignores the heterogeneous phenotypes of individual patients. To address this challenge, we developed a time-varied, mixed-effect, and generative Lotka-Volterra (tM-GLV) model to account for the heterogeneity of the evolution mechanism and the pharmacokinetics of individual patients. Then, we proposed a reinforcement learningenabled individualized IADT framework, namely, I2ADT, to learn the patient-specific tumor dynamics and derive the optimal drug administration policy. Experiments with clinical trial data demonstrated that the proposed I2ADT can significantly prolong the time to progression of prostate cancer patients with reduced cumulative drug dosage. This research elucidates the application of reinforcement learning techniques to identify personalized adaptive cancer therapy.