Cloud computing technology provides shared computing which can be accessed over the Internet. When cloud data centers are flooded by end-users, how to efficiently manage virtual machines to balance both economical cost and ensure QoS becomes a mandatory work to service providers. Virtual machine migration feature brings a plenty of benefits to stakeholders such as cost, energy, performance, stability, availability. However, stakeholder's objectives are usually conflicted with each other. Also, the optimal resource allocation problem in cloud infrastructure is usually NP-Hard or NP-Complete class. In this paper, the virtual migration problem is formulated by applying game theory to ensure both load balance and resource utilization. The virtual machine migration algorithm, named V2PQL, is proposed based on Markov Decision Process and Q-learning algorithm. The results of the simulation demonstrate the efficiency of our proposal which are divided into training phase and extraction phase. The proposed V2PQL policy has been benchmarked to the Round-Robin policy in order to highlight their strength and feasibility in policy extraction phase.