In the current scenario, audio visual speech recognition is one of the emerging fields of research, but there is still deficiency of appropriate visual features for recognition of visual speech. Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all human, suffer from unreliability in analyzing the lip movement. Here we used a custom dataset and design the system in such a way that it predicts the output for the lip reading. The problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Also due to recent developments and advances in the fields of signal processing and computer vision. The task of automating the lip reading is becoming a field of great interest. Here we use MFCC techniques for audio processing and LSTM method for visual speech recognition and finally integrate the audio and video using feed forward neural network (FFNN) and also got good accuracy. That is why the AVSR technique attract a great attention as a reliable solution for the speech detection problem. The final model was capable of taking more appropriate decision while predicting the spoken word. We were able to get a good accuracy of about 92.38% for the final model.