The ability to generate coherent and contextually relevant text is increasingly important in a variety of applications, prompting the need for more sophisticated language models. Our novel approach to next-phrase prediction within the Llama 2 model architecture significantly enhances both the accuracy and efficiency of text generation, setting it apart from traditional next-word prediction methods. Through the implementation of a dual-stage encoder-decoder framework, integrated attention mechanisms, and reinforcement learning techniques, the modified model achieves substantial improvements in BLEU and ROUGE scores, as well as reductions in perplexity, latency, and computational resource usage. Extensive evaluations across diverse datasets demonstrate the model's robustness and generalizability, showing its potential to significantly advance applications reliant on advanced language modeling capabilities. The research highlights the importance of continual innovation in optimizing model architectures and training methodologies to meet the growing demands of various natural language processing tasks. By systematically addressing the limitations of existing approaches, the study contributes valuable insights and methodologies to the field, paving the way for more efficient and accurate language models in real-time applications.