Much space remains for generating stories in the Chinese language. In this paper, we propose a novel approach to address this issue by multi-channel word embedding and effective control of the part-of-speech structure while producing sentences to imitate the writing style. The proposed approach consists of four parts: We first preprocess the sentences to label all sentences in the data set according to the format of <SOS> <MOS> <EOS>, where <SOS> <EOS>, and <MOS> represent the beginning, the end of the sentence, and the separation between sentences. We then propose a multi-channel method to embed words by integrating traditional vectorization methods including Word2vec, Fasttext, LexVec, and Glove to improve the information in the input data. We next optimize the model architecture to effectively control the process of sentence generation based on the BERT (Bidirectional encoder representations from) model. Finally, we perform some optimization on performance. For example, Softmax function in the model was optimized to reduce the search time during training. In addition, the GAN (generative adversarial network) architecture for the data set was revised to improve the training performance of the model. All sentences in the data set are built into a tree structure and the part-of-speech structure of the next sentence was generated through model generation based on FP-tree. The experimental results show that the proposed method can effectively control the generation of Chinese stories.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
This preprint is available for download as a PDF.
Loading...
Posted 18 Mar, 2021
Posted 18 Mar, 2021
Much space remains for generating stories in the Chinese language. In this paper, we propose a novel approach to address this issue by multi-channel word embedding and effective control of the part-of-speech structure while producing sentences to imitate the writing style. The proposed approach consists of four parts: We first preprocess the sentences to label all sentences in the data set according to the format of <SOS> <MOS> <EOS>, where <SOS> <EOS>, and <MOS> represent the beginning, the end of the sentence, and the separation between sentences. We then propose a multi-channel method to embed words by integrating traditional vectorization methods including Word2vec, Fasttext, LexVec, and Glove to improve the information in the input data. We next optimize the model architecture to effectively control the process of sentence generation based on the BERT (Bidirectional encoder representations from) model. Finally, we perform some optimization on performance. For example, Softmax function in the model was optimized to reduce the search time during training. In addition, the GAN (generative adversarial network) architecture for the data set was revised to improve the training performance of the model. All sentences in the data set are built into a tree structure and the part-of-speech structure of the next sentence was generated through model generation based on FP-tree. The experimental results show that the proposed method can effectively control the generation of Chinese stories.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
This preprint is available for download as a PDF.
Loading...