Due to its potential application value, multi-UAV air combat missions have become a hot topic in the current military research field at home and abroad. With the rise of artificial intelligence and intelligent warfare, a series of UAV air combat decision-making methods based on deep reinforcement learning have been proposed by various countries. However, when the number of agents increases and the type of aircraft changes, the action space and state space of reinforcement learning will change, which will face the problem of poor stability of the training process and is extremely sensitive to hyperparameters (such as learning rate). Therefore, based on the traditional multi-agent reinforcement learning based on AC decision network, this paper improves the decision network, introduces curriculum learning and transfer learning, and proposes a multi-agent reinforcement learning decision framework that can converge quickly and improve training robustness.