Digital Image Art Style Transfer Algorithm and Simulation Based on Deep Learning Model

In order to solve the problems of poor region delineation and boundary artifacts in Indian style migration of images, an improved Variational Autoencoder (VAE) method for dress style migration is proposed. Firstly, the Yolo v3 model is used to quickly identify the dress localization of the input image, and then the classical semantic segmentation algorithm (FCN) is used to nely delineate the desired dress style migration region twice, and nally the trained VAE model is used to generate the migrated Indian style image using a decision support system. The results show that, compared with the traditional style migration model, the improved VAE style migration model can obtain ner synthetic images for dress style migration, and can adapt to different Indian traditional styles to meet the application requirements of dress style migration scenarios. We evaluated several deep learning based models and achieved BLEU value of 0.6 on average. The transformer-based model outperformed the other models, achieving a BLEU value of up to 0.72.


Introduction
In art creation, style is a speci c, abstract representation of the characteristics of an artistic school. With the same content, different styles can show different era backgrounds and cultural allegories. In digital image processing, image style migration refers to the method of extracting the content features of one image and the style features of another image separately, and then fusing them to generate an image with a new style [1]. Among them, image style mainly includes image texture and image color. In the eld of art creation, artists use brushes and dyes to draw or imitate various art styles, which is a di cult task requiring professional skills and a lot of time, but with the help of computers this task can be made less di cult and more satisfactory. In the eld of computer vision, the traditional image style migration method has many drawbacks in practice, especially in the process of image style migration, which requires professional style analysis of images in advance, and then mathematical modeling of abstract style features using complex and tedious mathematical formulas [2]. This is a time-consuming and laborintensive process, and speci c image styles often need to be mathematically modeled in a speci c way, but the visual results obtained are unsatisfactory, and the generality and usability of the algorithm model are extremely poor.
With the development of deep learning in recent years [3], a series of signi cant breakthroughs have been made in the eld of computer vision, such as image classi cation, image segmentation, object localization, and other elds that have frequently achieved amazing research results. In 2015, [4] pioneered a deep learning-based image style migration method, which greatly improved the effectiveness and usability of the image style migration method, as shown in Figure 1. Since then, deep learning has been involved in the eld of image style migration, which has attracted wide attention from academia and industry, and has achieved good results in practical applications, such as the emergence of Prisma, Ostagram, Deep Forger and other highly popular image processing applications. It is believed that in the near future, deep learning-based image style migration will be widely used in the production of lm and television special effects, industrial simulation design, artwork design and other elds.
Compared with traditional image style migration methods, the stylized image visuals of deep learningbased image style migration methods have signi cant advantages in terms of texture and color. Using the deep learning approach, the high-level abstract features of images, such as image texture, image color and image structure, can be e ciently extracted and combined in a way that is consistent with human visual habits, with excellent versatility and ease of use, eliminating the need for repetitive and tedious mathematical modeling processes [5].
Image style migration is an interesting and important technique in the eld of computer vision, and the emergence of deep learning-based image style migration methods has promoted the development of image style migration. At present, there are still two problems that need to be improved: rst, the current deep learning-based image style migration method is computationally intensive, which largely limits its promotion in practical applications, so there is a need to improve the e ciency of the current algorithm or propose a better solution; second, the deep learning-based image style migration method is prone to image quality instability in the generated images, and the effect of stylized images still has much room for improvement. instability, and the effect of stylized images still has much room for improvement. Therefore, how to improve the computational e ciency of deep learning-based image style migration methods, enhance the visual effect of stylized images, and maximize the scale of compressed model parameters is an important research hotspot, which is important for the promotion of its commercial application.
The main contributions of this paper are the following. We design a style migration algorithm based on the traditional variational self-encoder and apply it to the study of style migration of images. We use variational self-encoders for extracting styles from style images which are then applied to clothing localities where desired. Yolo v3 algorithms is used to perform detection of the clothing models. Then we perform a more accurate semantic segmentation of the target region using the classical semantic segmentation algorithm to extract local targets and achieve style migration.
The rest of the paper is organized as following. In section 2, we summarize and analyze some of the related work, Section 3 presents an overview of the proposed system, in section 4, we discuss the experimental setup and results in detail and section 5 is the conclusion of the paper.

Related Work
Image style is a kind of artistic characteristic with synthesis embodied in painting production, which has been the highlight of a hundred schools of thought in the art world. With the development of computer technology, digital image processing has become the most widespread means of painting production nowadays. Digital image creation involves many mathematical modeling theories, such as linear algebra, calculus, statistics and so on. Image processing has also evolved from simple linear changes to complex and complicated mathematical modeling to meet the various needs of people.
In the traditional image style migration methods, the implementation of style migration focuses on the drawing of object models and the synthesis of image textures.
[6] proposed an algorithm that synthesizes new textures simply by stitching and reorganizing sample textures. [7] proposed a method based on the idea of analogy, which synthesizes images with new textures by mapping relationships of image analogous features.
[8] used modules such as multi-layer texture array, Chinese painting lighting model, and extraction of contour lines to draw 3D Chinese painting effect mountain scenes in real time. [9] proposed a neighborhood consistency metric to improve the e ciency of image matching point search by introducing statistical properties into the similarity metric. Although, these methods have achieved considerable results in processing images with simple structures, they produce results that are di cult to meet the practical needs when dealing with images with more complex colors and textures. With the traditional methods, we may not achieve satisfying results when the images are not simple. The emergence of deep learning has changed this situation, and has greatly promoted the development of image style migration.
Thanks to the rise of deep learning, [10] rst discovered that pre-trained convolutional neural network models could be used as feature extractors to extract abstract features of images, and then separate and reorganize them with stunning artistic results, [11] used Gatys et al.'s feature extractor as the core part of the objective function of the feedforward network, while maintaining the same image migration effect. resulting in a computational e ciency improvement of three orders of magnitude. Based on this, [12] suggested that it is redundant to train some images with similar styles separately, and thus proposed to train the same type of images together after normalization, and also to combine multiple styles of images together at the same time. [13] focused on improving the controllability of spatial location, color information, and scale size when migrating images based on previous work, effectively improving the quality and exibility of migration. For example, on a content map with grass and sky, then control over spatial information can be used so that the grass part gets the texture of one style map and the sky part gets the texture of the other style map. [14] used image style migration to turn a doodle with only a few colors into a beautiful painting. In addition, since portrait migration often distorts the face structure, [15] introduced the concept of mapping enhancement to control the spatial structure, which enables portrait migration to migrate textures while preserving the face structure.
[16] introduced image semantic segmentation techniques that make it possible to migrate individual target objects in an image. [17] used image style migration method to achieve super-resolution of images and obtained very good results.
[18] extended the style migration of images to videos, making the style of the whole video consistent with the style map, and solved the problem that the image style migration method is prone to unstable and ickering images when applied to videos. However, the method runs too slowly and takes several minutes for each frame, so the practicality is low. [19] proposed to apply the end-to-end network training approach to style migration of videos, which makes it possible to further improve the speed of image style migration while ensuring the stability of video frames. [20] used image style migration to colorize sketches, which can save a large amount of painting coloring time.
At present, although the image style migration method based on deep learning has obtained good results, its inherent essential principle is still relatively ambiguous, such as the Gram matrix proposed by [21].
Although it is successful in extracting image texture, there is no convincing theoretical support, and the related literature is only improved by adjusting parameters and other methods, and no direct theoretical in-depth study. In contrast, [22] argued that the calculation of the Gram matrix is equivalent to nding the minimized maximum mean difference, and argued that deep learning-based image style migration is a theoretical and empirical approach.

Style Migration Based On Variational Self-encoder
The automatic discovery and recognition of visual concepts from raw image data is a major open challenge for AI research. To address this problem, researchers have proposed a variant of unsupervised learning methods to represent potentially complex factor relationships. One takes inspiration from neuroscience and explains how this can be achieved by applying the same learning capabilities in an unsupervised generative model. By simulating the ventral visual pathway in the brain, forcing redundancy reduction and encouraging statistical independence, a variational self-encoder capable of learning complex factors is built(VAE) framework capable of learning complex factors. The existing variational self-encoder model uses adversarial training of the discriminator and the variational self-encoder to enable the encoder to isolate the image content representation in latent space from the image. The image content representation is then used as the input to the generator while adding the target style vector Z to generate the target style image. The style vector added at the generator side is obtained from the binary label vector by linear transformation. Currently, the variational self-encoder has shown excellent results in training tests on a wide range of datasets. The framework performs interpretable factorized representation of factors generated from independent data without supervised learning. Arti cial intelligence is capable of learning and reasoning like humans and can automatically discover interpretable factorization potential representations from raw image data in a completely unsupervised manner [23].

Overall Structure
A self-encoder is a form of data processing in which the target data X is encoded into a vector Z and a decoder can regenerate Z into X'. Since the form of Z is xed, the working process of the self-encoder is xed and cannot meet the demand of processing multi-form arbitrary data. Therefore, researchers have proposed variational self-encoders to solve this problem.
See Fig. 2 for a schematic diagram of the structure of the variable division self-encoder.
As shown in Fig. 2, it is straightforward to generate a new potential vector Z for the original data, which includes the information of the original data and the noise information. where the original data sample {X 1 , X 2 , …, X n }, as a whole, is denoted by X. The distribution of X is p(X): Among them, the description of the underlying structural dimensions is the key that differentiates the variational self-encoder from the self-encoder.
The internal schematic of the variational self-encoder is shown in Fig. 2, from which it is shown that the simple vector Z does not explain the dimensionality and that the sample Z can be obtained from 1 simple distribution: N (0, I), where I is the unit matrix. Since any distribution in n-dimensional space can be generated with n variables obeying normal distribution and can generate 1 su ciently complex function mapping out. This process is called encoder in variational self-encoder, and its main role is to generate the probability distribution of potential variables by the input of the original data. And where the decoder is to generate the new X' conditional distribution. The reconstruction process becomes more complicated due to the addition of noise, but it is the presence of noise that increases the randomness of the reconstruction results, with the aim of obtaining a better reconstruction model [24].

Image style migration algorithm based on variational self-encoder
Based on the characteristics of the variational self-encoder, this paper designs a style migration algorithm based on the variational self-encoder and applies it to the study of Chinese style migration of images.
The algorithm is redesigned based on the traditional variational self-encoder and consists of three main components: encoder, decoder and loss function [25].
The schematic structure of the image style migration algorithm based on variational self-encoder is shown in Fig. 4, from which it is shown that the input raw data are content image (content) and synthetic image (style), which are input to the encoder to obtain the potential style factor Z. After the style factor is input to the encoder together with the content image, the content of the content image and the style of the synthetic image can be fused to obtain a new output image. Further, the reconstruction loss function is used in the loss function to evaluate the difference between the output image and the synthetic image, and the KL scatter loss function is used to bound the normal distribution of the style factor Z [26].

Apparel image pre-processing and style migration solutions
Incorporating Chinese style into currently popular clothing is not simply a matter of relocating the entire image for a change in style. Because there is no style-less clothing, nor is there a style that exists separately from clothing. And the boundary between content and style is very blurred, and it is even more di cult to draw the line when applied to the style transfer of clothing.
In this paper, we study the use of variational self-encoders to extract styles from style images and apply them to clothing localities where a change in style is desired. Among them, the main preprocessing of clothing images are target detection and target segmentation [27]. The algorithm of Yolo v3 is chosen to perform target detection of the clothing models in the content images. Then a more accurate semantic segmentation of the target region is performed using the classical semantic segmentation algorithm ( FCN) to achieve accurate extraction of local targets and nally achieve style migration only for local locations.
The Mask-RCNN used in this study uses Faster-RCNN as the main framework, and introduces another FCN parallel branch in the head of the network to detect the mask map information of ROI, so that the head contains 3 sub-tasks: classi cation, regression and segmentation. Phase 1 scans the images and generates proposals (i.e., regions that may contain one target), while phase 2 classi es the proposals and generates bounding boxes and masks.
The process of Mask-RCNN is usually to input an image to be processed for pre-processing (or directly input the pre-processed image), input the processing result into a pre-trained neural network to obtain a corresponding feature map, set a prede ned ROI region for each point in the feature map, and obtain several candidate ROI regions. For the remaining ROIs, the original image and the pixel points of the feature map are matched, and the feature map is matched with the xed features, i.e., each point in the ROI is bilinearly interpolated with the coordinates of the 4 vertices of the grid in which it is located. OI for classi cation, border regression and MASK generation (FCN operation inside each ROI) [24]. Based on the traditional variational self-encoder, the encoder and decoder are adjusted to be able to achieve style migration of garments in various ways to achieve different effects. In the rst method, the complete variational self-encoder architecture is kept, and the overall model is used as a style migration network. The style images with Chinese style are input into the encoder, and the pre-processed original garment content images are input, and the local details are migrated with Chinese characteristic style to nd the potential variables, and the stylized composite images are input through the decoder. The 2nd method, by blocking out the encoder, uses the content image input to the decoder and the sampling in the normal distribution as the potential style variables, and nally achieves the xed clothing style unchanged and the multi-style variation of the target clothing. The third method, by using the xed style code, changing the input garment content image, and blocking the encoder to extract the potential style step, can realize the output of garment sample with the same style but different content.

Experiment content
In this chapter, the operating system used is Windows 7 64-bit operating system, the CPU is dual-core Intel CPU I5, the memory size is 8GB, the GPU used is NVIDIA GTX 1050Ti, the deep learning framework uses TensorFlow, and the image processing toolkit uses Opencv and Pil. All the image data used are sourced from the network, and all the experimental images are cropped or stretched by the image toolkit to facilitate the experiments and the presentation of the results. the size of the experimental data input is not xed, but the size of the same set of experimental data must be consistent, i.e., the size and color channels of the content image and the style image must be the same.
In this paper, the convolution layer of the pre trained vgg-19 network model is used as the abstract feature extractor. The number and relative position of these network layers determine the local scale of image style matching, which plays a decisive role in the nal effect of the visual experience of the synthetic image. In the experiment, 'conv4_ 2 'as the content representation layer of the content image, the weight factor of the image content loss function α = 100.0, and conv1_ 1', 'conv2_ 1, 'conv3_ 1, 'conv4_ 1, 'conv5_ 1 'as the style representation layer of the style image, the weight factors of the image style loss function are β = 1000.0, the smoothing weight factor of the composite image x is γ = 0.001, and the color migration weight factor λ L = 1.2, λ A =λ B = 1.3. In the training process, the Adam algorithm based on random gradient descent is used, and 1500 optimization iterations are carried out through the back propagation algorithm to minimize equation (6). The calculation time of a single GPU is about 150 seconds

Experimental results
Image texture synthesis is one of the important processes for synthesizing a new style of image, and its goal is to infer the process of synthesizing that image texture from an example image texture, which in turn can produce any number of new samples of that image texture. Image textures are pervasive image visual features that can be used to describe surface phenomena of things. The image texture structure re ects the spatial variation of the values of pixels in an image with a speci c distribution pattern.
Compared with traditional image texture synthesis methods, the powerful parametric texture model of convolutional neural network has substantial and big improvement in image texture synthesis. The quality of image texture synthesis is usually evaluated by the line contour and color distribution of the synthesized texture, and the higher the similarity between the synthesized texture and the example texture is observed, the more natural the visual experience is, the more successful the image texture synthesis is. As shown in Figure 5, the method of [24] causes unnatural and scattered problems in the synthesized image, and certain image areas show severe color corruption. This is largely due to the image color migration in the RGB color space, where the color channels are strongly correlated, which can lead to color disorder when color migration is performed. The method in this paper can effectively solve these problems and make the overall color transition of the synthesized style image good and natural.
In terms of color control, the effect of image color migration largely affects the nal effect of image style migration based on color retention. The color information of an image is an important part of its style direct perception, but the color distribution often appears uneven and mismatched in image color migration, and then effective color control is required to ensure the effect of the synthesized image. Therefore, in this paper, the color migration method of Reinhard et al. is improved by adding the weight coe cients of the relevant color channels to obtain better color effects through the parameter adjustment method. As shown in Fig. 6 and Fig. 7, compared with the method of Gatys et al, the color effect of this paper appears to be richer and more natural.
In addition, color and texture are two key elements of image style. In image style migration, color preservation is a typical use case with high requirements for color processing. Two methods of image style migration based on color preservation are proposed in [20]: one is linear color migration in RGB color space, which migrates the color of the content image into the style image, thus making it possible to maximize the color preservation of the content image during image style migration; the other is image style migration in the luminance space of the content image only, as a way to preserve the original content image color space invariance. [21] used a local linear model to enhance the coordination and correlation between the local and the overall, and realized the way that color migration can refer to multiple images, further enhancing the effect and exibility of image color migration, are shown in Table  1. With the development of deep learning-based image style migration, its commercial application value has received widespread attention, mainly in the following three aspects.
Image beauti cation is a popular application technique on social networks, such as advertising images, sel e photos, and so on. However, traditional image style migration methods appear to be simple and xed in terms of digital image processing techniques, which are di cult to meet some more abstract needs. Deep learning can bring more room for innovation and imagination for image style design. Among them, the content-aware image style migration method is effective, which fully considers the two problems of "where to do image style migration" and "how to do image style migration", and it performs well in the eld of image restoration showed excellent results in the related work are shown in Table 2. In addition, the image style migration method can also colorize comic sketches, and in the related work of [18], image style migration not only accomplished the task of colorizing the image brilliantly, but also the local features of the image worked very naturally. In terms of applications, Prisma, a mobile APP program, is one of the most popular free applications providing deep learning-based image style migration, which can convert user input images into high-quality art style paintings in just a few seconds. Subsequently, a number of mobile APPs or web-based systems for image style migration have emerged for a fee and have generated some commercial value. With the help of these applications, people can easily create their favorite art style works without the need for special expertise and without the need for a lot of time and expense.
Visual effects-related technologies are found everywhere in entertainment and lm-related industries, such as lm production, television production, animation production, etc. However, visual effects are very expensive to create. If arti cial intelligence could be used to perform these tasks, it would greatly reduce the cost, and deep learning-based image style migration is one of the solutions to be considered. For example, [22] used optical ow techniques and a collection of deep convolutional neural networks to achieve artistic stylization for lm production. The work of [16] fully considers the coherence problem between consecutive frames in video stylization by introducing a temporal consistency loss function to constrain the global variability of images between consecutive frames. [21] constructed a generative model with temporal correlation constraint, which not only can perform a variety of stylization computations, but also can perform real-time stylization for online videos. [23] delved into and analyzed image style migration in a more advanced abstraction of hyperparameter space in deep learning and found a set of effective parameter module components to perform impressionistic stylization of movie scenes. Deep learning-based image style migration in video processing still needs to be studied and analyzed more deeply, and from the current progress, its great potential commercial value will be further explored in the near future are shown in Table 3. Aids to style design. Image style migration can serve as an effective design aid technique, such as painting art creation, architectural style design, clothing fashion design, game special effects scene design, etc. Although there are not many references or more successful applications, deep learning-based image style migration is likely to become an important research hotspot in the near future, given the signi cant breakthroughs in various elds of deep learning in recent years.
In academia, in general, the two main categories of methods include image-based iterative and modelbased iterative. Among them, depending on the image style acquisition method, the image iteration-based methods can be categorized as MMD (Maximum Mean Discrepancy), MRF (Markov Random Field), and DIA (Deep Image Analogy). The main approaches based on model iteration can be categorized as generative model-based and image-reconstruction decoder-based, depending on the model iteration method. These representative methods have excellent results, but there are still some problems that need to be studied in depth.
The balance between content, texture and color in image style migration determines the degree of viewability of the nal generated image, and the current failure cases are often caused by the unreasonable adjustment of these three aspects. Therefore, an in-depth study of the balance between image content, texture and color, as well as systematic and repeated experiments on the adjustment of their related parameters and weights, is an important part of the work to further improve the quality of stylized images, as shown in Figure 8.

Conclusions
The deep learning-based image style migration method is a parametric generative model with good t. However, the current neural network model is a black box in which the physical meaning of hyperparameters is di cult to understand or cannot be interpreted, which adds a great di culty to improve the deep learning-based image style migration method. Therefore, it is an important challenge to investigate this algorithm from the theoretical aspect.

Declarations Data Availability
The datasets used during the current study are available from the corresponding author on reasonable request.

Con icts of Interest
The author declares that there are no con icts of interest.