Language is a remarkable cognitive ability that can be expressed through visuospatial (written language) or audio-oral (spoken language) modalities. When visual characters and auditory speech exhibit conflicting features, individuals may selectively attend to a certain feature. However, the dominant modality in such a situation and the neural mechanism underlying it are both still unclear. To investigate the neural mechanism underlying audio-visual competition in Chinese, newly developed Chinese character-speech materials were presented to the study participants, and the participants' electroencephalographic (EEG) and behavioural responses were recorded. When audio-visual competition occurs, the brain more accurately recognizes meaningless speech sounds. Event-related potential (ERP) analysis showed that incongruent audio-visual stimuli induced a larger N400 amplitude than congruent audio-visual stimuli elicited. In addition, the N400 amplitude was larger in the auditory mismatch condition than in the visual mismatch condition. In terms of the brain network, compared with that in the visual mismatch condition, the dominant reconfiguration pattern in the auditory mismatch condition was the stronger linkages in the posterior occipital-parietal areas. Our research illustrates the superiority of the auditory modality over the visual modality, extending our understanding of the neural mechanisms of audio-visual competition in Chinese.