To improve the speech emotion recognition accuracy with availability of multi-level features, it is natural to fuse the emotion information and reduce the distribution mismatch of multi-level features simultaneously. To address the above issues, based on two constraints, we propose a novel transfer multiple kernel learning for fusing multi-level features (MFF-TMKL) . The first constraint aims at fusing the emotion information from multi-level features. Correspondingly, a weighted collaborative representation constraint is proposed to learn the similarity and distinctiveness of multi-level features by a space induced by the multiple kernel functions The second one aims at reducing the Maximum Mean Discrepancy of multi-level features also in this multiple kernel induced space. Experimental results on the Aibo emotional speech database show that:(1) MFF-TMKL achieves significant improvement than MKL, TMKL and MFF-TMKL with only one constraint. Specifically, using the utterance-level and unvoiced-level features, MFF-TMKL achieve the optimal recognition accuracy with 43.4% (2 ) using the utterance-level and unvoiced-level features, the recognition rate of MFF-TMKL is higher than the several state-of-the-art algorithms methods.