Error control coding is nearly ubiquitous in our information-based society. In recent days, one amongst the error correcting techniques referred to as the polar codes which attracts the researchers as it represents one of the foremost breakthroughs in 5G standard. It is built by polarization effect of polarization matrix and it is one of the capacity achieving algorithms. It is proved that Successive List decoding (SCL) algorithm improves the efficiency of Polar codes. However, when the codelength increases the latency also increases. It is expected that Reinforcement learning algorithm (RLA) will be able to reduce the latency of the decoder. Therefore, in this article, Markov decision-process algorithm is proposed. RLA uses this Markov decision process when the decoding probabilities are unknown. The same is implemented in the hardware architecture. The implementation result shows that, this method reduces decoding latency to 33% without sacrificing a frame error rate. Experiment result shows that the hardware complexity is also reduced when compared to SCL decoding algorithm. This project is developed using System Generator (Xilinx), with a target device of FPGA-Virtex6.