Background: Heart sound segmentation is a long-standing problem in heart analysis, and it is mainly caused by noise interference and diversification of heart sounds. Faced with the challenging of heart sound segmentation, a more applicable segmentation model was studied. Methods: In this process, the optimal modified Log-spectral amplitude and wavelet were used to suppress the noise in the heart sound, and used the duration-dependent hidden Markov model based on personalized Gaussian mixture model (PGMM-DHMM) to segment the fundamental heart sound (FHS) and the non-fundamental heart sound (non-FHS). Then used the optimized Mel frequency cepstral coefficient (MFCC) to realize the classification of S1 and S2 heart sound frames through the Convolutional neural network (CNN) classifier, which can avoid the errors caused by the ambiguity of the time domain features. Results: PGMM-DHMM can segment FHS more effectively, and the accuracy is 94.3%. The CNN classifier obtained the best results in the S1 and S2 classifications, the accuracy is 90.92%, the precision of S1 is 90.76%, the recall is 91.05%, the F-measure is 90.9%, and the precision of S2 is 91.07%, the recall is 90.79%, the F-measure is 90.93%. The final segmentation accuracy is 92.92%. In addition, the experimental results further indicate that CNN has more robust performance when classifying abnormal S1 and abnormal S2. Conclusions: The PGMM-DHMM model can better segment FHS and Non-FHS. The optimization of MFCC improves the classification effect of S1 and S2, and the improvement effect by the CNN classifier is significant, especially for abnormal heart sounds. The proposed algorithm is better than other algorithms at this stage.