The increasing demands for higher data rates in Beyond 5G (B5G) networks necessitate the effective deployment of small cells (SCs) to ensure optimal Quality of Service (QoS) for cell edge user equipment (CEUE) in densely populated areas. Our study introduces an advanced two-step algorithm that leverages reinforcement learning to match dynamic resources and optimize power allocation to maximize network performance. Initially, our upgraded algorithm employs a novel Q-learning approach to optimally pair user equipment (UEs) with available resources, focusing on sum-rate maximization while minimizing interference at each step. Subsequently, we enhance the power allocation strategy using a sophisticated differential evolution algorithm that adapts to varying network conditions, ensuring robust QoS for CEUEs. The proposed method not only refines the matching and power allocation processes, but also introduces adap-tive thresholds for real-time network adjustments, contributing to a substantial performance increase over existing methods. Our results indicate improvements in sum-rate by at least 15% and 35% in interference reduction compared to conventional techniques. This study not only underscores the efficiency of the integrated reinforcement learning approach in dense B5G environments but also sets a foundation for future research in autonomous network management.