In this section, in addition to the proposed model, a supervised model has been designed and trained using the labeled steady-state heat transfer data in order to evaluate the efficiency of the proposed model in equilibrium condition simulation. The network architecture used in the proposed model comprised of 12 hidden layers including 6 encoder layers and 6 decoder layers in the first part(u-net network), and 4 lambda layers in the second part in order to transfer 4 boundary values of the input to the output. Further details of this network architecture, including the number and type of layers, output dimensions, number of parameters of each layer, and how the layers are connected to each other, are shown in Table 1.
Table 1
Architectural details of the steady-state temperature distribution predictor network
layer number
|
layer type
|
output dimensions
|
parameters
|
previous layer
|
1
|
Input layer
|
(None, 64, 64, 1)
|
0
|
|
2
|
conv2d
|
(None, 64, 64, 16)
|
160
|
1
|
3
|
batchnormalization
|
(None, 64, 64, 16)
|
54
|
2
|
4
|
activation
|
(None, 64, 64, 16)
|
0
|
3
|
5
|
conv2d
|
(None, 64, 64, 16)
|
2320
|
4
|
6
|
batchnormalization
|
(None, 64, 64, 16)
|
64
|
5
|
7
|
activation
|
(None, 64, 64, 16)
|
0
|
6
|
8
|
maxpooling2d
|
(None, 32, 32, 16)
|
0
|
7
|
9
|
conv2d
|
(None, 32, 32, 32)
|
4640
|
8
|
10
|
batchnormalization
|
(None, 32, 32, 32)
|
128
|
9
|
11
|
activation
|
(None, 32, 32, 32)
|
0
|
10
|
12
|
conv2d
|
(None, 32, 32, 32)
|
9248
|
11
|
13
|
batchnormalization
|
(None, 32, 32, 32)
|
128
|
12
|
14
|
activation
|
(None, 32, 32, 32)
|
0
|
13
|
15
|
maxpooling2d
|
(None, 16, 16, 32)
|
0
|
14
|
16
|
conv2d
|
(None, 16, 16, 64)
|
18496
|
15
|
17
|
batchnormalization
|
(None, 16, 16, 64)
|
256
|
16
|
18
|
activation
|
(None, 16, 16, 64)
|
0
|
17
|
19
|
conv2d
|
(None, 16, 16, 64)
|
36928
|
18
|
20
|
batchnormalization
|
(None, 16, 16, 64)
|
256
|
19
|
21
|
activation
|
(None, 16, 16, 64)
|
0
|
20
|
22
|
maxpooling2d
|
(None, 8, 8, 64)
|
0
|
21
|
23
|
conv2d
|
(None, 8, 8, 128)
|
73856
|
22
|
24
|
batchnormalization
|
(None, 8, 8, 128)
|
512
|
23
|
25
|
activation
|
(None, 8, 8, 128)
|
512
|
24
|
26
|
conv2d
|
(None, 8, 8, 128)
|
147584
|
25
|
27
|
batchnormalization
|
(None, 8, 8, 128)
|
512
|
26
|
28
|
activation
|
(None, 8, 8, 128)
|
0
|
27
|
29
|
maxpooling2d
|
(None, 4, 4, 128)
|
0
|
28
|
30
|
conv2d
|
(None, 4, 4, 256)
|
295168
|
29
|
31
|
batchnormalization
|
(None, 4, 4, 256)
|
1024
|
30
|
32
|
activation
|
(None, 4, 4, 256)
|
0
|
31
|
33
|
conv2d
|
(None, 4, 4, 256)
|
590080
|
32
|
34
|
batchnormalization
|
(None, 4, 4, 256)
|
1024
|
33
|
35
|
activation
|
(None, 4, 4, 256)
|
0
|
34
|
36
|
maxpooling2d
|
(None, 2, 2, 256)
|
0
|
35
|
37
|
conv2d
|
(None, 2, 2, 512)
|
1180160
|
36
|
38
|
batchnormalization
|
(None, 2, 2, 512)
|
2048
|
37
|
39
|
activation
|
(None, 2, 2, 512)
|
0
|
38
|
40
|
conv2d
|
(None, 2, 2, 512)
|
2359808
|
39
|
41
|
batchnormalization
|
(None, 2, 2, 512)
|
2048
|
40
|
42
|
activation
|
(None, 2, 2, 512)
|
0
|
41
|
43
|
maxpooling2d
|
(None, 1, 1, 512)
|
0
|
42
|
44
|
conv2d
|
(None, 1, 1, 1024)
|
4719616
|
43
|
45
|
batchnormalization
|
(None, 1, 1, 1024)
|
4096
|
44
|
46
|
activation
|
(None, 1, 1, 1024)
|
0
|
45
|
47
|
conv2d
|
(None, 1, 1, 1024)
|
9438208
|
46
|
48
|
batchnormalization
|
(None, 1, 1, 1024)
|
4096
|
47
|
49
|
activation
|
(None, 1, 1, 1024)
|
0
|
48
|
50
|
upsampling2d
|
(None, 2, 2, 1024)
|
0
|
49
|
51
|
concatenate
|
(None, 2, 2, 1536)
|
0
|
50, 42
|
52
|
conv2d
|
(None, 2, 2, 512)
|
7078400
|
51
|
53
|
batchnormalization
|
(None, 2, 2, 512)
|
2048
|
52
|
54
|
activation
|
(None, 2, 2, 512)
|
0
|
53
|
55
|
conv2d
|
(None, 2, 2, 512)
|
2359808
|
54
|
56
|
batchnormalization
|
(None, 2, 2, 512)
|
2048
|
55
|
57
|
activation
|
(None, 2, 2, 512)
|
0
|
56
|
58
|
upsampling2d
|
(None, 4, 4, 512)
|
0
|
57
|
59
|
concatenate
|
(None, 4, 4, 768)
|
0
|
58, 35
|
60
|
conv2d
|
(None, 4, 4, 256)
|
1769728
|
59
|
61
|
batchnormalization
|
(None, 4, 4, 256)
|
1024
|
60
|
62
|
activation
|
(None, 4, 4, 256)
|
0
|
61
|
63
|
conv2d
|
(None, 4, 4, 256)
|
590080
|
62
|
64
|
batchnormalization
|
(None, 4, 4, 256)
|
1024
|
63
|
65
|
activation
|
(None, 4, 4, 256)
|
0
|
64
|
66
|
upsampling2d
|
(None, 8, 8, 256)
|
0
|
65
|
67
|
concatenate
|
(None, 8, 8, 384)
|
0
|
66, 28
|
68
|
conv2d
|
(None, 8, 8, 128)
|
442496
|
67
|
69
|
batchnormalization
|
(None, 8, 8, 128)
|
512
|
68
|
70
|
activation
|
(None, 8, 8, 128)
|
0
|
69
|
71
|
conv2d
|
(None, 8, 8, 128)
|
147584
|
70
|
72
|
batchnormalization
|
(None, 8, 8, 128)
|
512
|
71
|
73
|
activation
|
(None, 8, 8, 128)
|
0
|
72
|
74
|
upsampling2d
|
(None,16,16, 128)
|
0
|
73
|
75
|
concatenate
|
(None,16,16, 192)
|
0
|
74, 21
|
76
|
conv2d
|
(None, 16, 16, 64)
|
110656
|
75
|
77
|
batchnormalization
|
(None, 16, 16, 64)
|
256
|
76
|
78
|
activation
|
(None, 16, 16, 64)
|
0
|
77
|
79
|
conv2d
|
(None, 16, 16, 64)
|
39928
|
78
|
80
|
batchnormalization
|
(None, 16, 16, 64)
|
256
|
79
|
81
|
activation
|
None, 16, 16, 64)
|
0
|
80
|
82
|
upsampling2d
|
(None, 32, 32, 64)
|
0
|
81
|
83
|
concatenate
|
(None, 32, 32, 96)
|
0
|
82, 14
|
84
|
conv2d
|
(None, 32, 32, 32)
|
27680
|
83
|
85
|
batchnormalization
|
(None, 32, 32, 32)
|
128
|
84
|
86
|
activation
|
(None, 32, 32, 32)
|
0
|
85
|
87
|
conv2d
|
(None, 32, 32, 32)
|
9248
|
86
|
88
|
batchnormalization
|
(None, 32, 32, 32)
|
128
|
87
|
89
|
activation
|
(None, 32, 32, 32)
|
0
|
88
|
90
|
upsampling2d
|
(None, 64, 64, 32)
|
0
|
89
|
91
|
concatenate
|
(None, 64, 64, 48)
|
0
|
90, 7
|
92
|
conv2d
|
(None, 64, 64, 16)
|
6928
|
91
|
93
|
batchnormalization
|
(None, 64, 64, 16)
|
64
|
92
|
94
|
activation
|
(None, 64, 64, 16)
|
0
|
93
|
95
|
conv2d
|
(None, 64, 64, 16)
|
2320
|
94
|
96
|
batchnormalization
|
(None, 64, 64, 16)
|
64
|
95
|
97
|
activation
|
(None, 64, 64, 16)
|
0
|
96
|
98
|
conv2d
|
(None, 64, 64, 1)
|
17
|
97
|
99
|
lambda
|
(None, 1, 62, 1)
|
0
|
1
|
100
|
lambda
|
(None, 62, 62, 1)
|
0
|
98
|
101
|
concatenate
|
(None, 63, 62, 1)
|
0
|
100, 99
|
102
|
lambda
|
(None, 1, 62, 1)
|
0
|
1
|
103
|
lambda
|
(None, 64, 1, 1)
|
0
|
1
|
104
|
concatenate
|
(None, 64, 62, 1)
|
0
|
101, 102
|
105
|
concatenate
|
(None, 64, 63, 1)
|
0
|
103, 104
|
106
|
lambda
|
(None, 64, 1, 1)
|
0
|
1
|
107
|
concatenate
|
(None, 64, 64, 1)
|
0
|
105, 106
|
The structure and architecture of the supervised model, like the u-net network used in the first part of the steady-state temperature distribution predictor network in the proposed model. The architecture of both models is considered in exactly the same conditions in terms of the number of main layers, the number of learnable parameters as well as the number of training data used for training. The input data to the monitored model network are matrices with zero values of size 64 × 64. The four edges are set with four different numbers between 0 and 100. The labels are the equilibrium condition corresponding to the square input data. The input data to the monitored model network are matrices with zero values in the size of 64× 64×64, the four edges of which are set with four different numbers between 0 and 100. The labels are the equilibrium condition corresponding to the square input data. In this experiment, the data generated by the finite difference method obtained using the CFD simulation is used as an accuracy criterion to measure the accuracy of the outputs generated by the networks of both models. In the model created by the proposed method, only 400 data of size 8 × 8 were used to extract the steady-state heat transfer pattern during this experiment. In this model, the steady-state temperature distribution predictor network is trained to minimize the error defined in Equation 2 without directly observing any heat distribution data. In contrast, the supervised model used 5,000 steady-state heat transfer labeled data to train its network. The two models have been compared and evaluated with each other from different aspects such as generated output data, changes in the percentage of average absolute error during the training process, as well as the method and process of learning to produce the steady state heat transfer data.
4.1. Comparison and evaluation of the proposed and supervised models
Figure 7 shows the outputs simulated by the proposed model and the supervised model. The first column of the figure shows the input data in this experiment. The input data to the models are matrices of size 64 × 64 with zero values, the 4 edges of which are set with 4 different temperatures between 0 and 100 degrees Celsius. Each matrix shows the thermal conditions of the conductive plate at the first moment of heat application. Therefore, the only non-zero entries in the input are at the boundaries. These areas are marked with dark blue in the figure. The 4 edges around the plate, which are marked with different colors, represent the different temperatures applied to the conductive plate in the first moment. The boundary conditions for the ten input data displayed in this column are listed in 4 different temperatures in Table 2. The data displayed in the next three columns show the temperature distribution of the input data at any point on the screen using the three different methods after the heat transfer process converges to an equilibrium temperature distribution. The data solved by the finite difference method are assumed as real equilibrium conditions and placed in the second column. The next two columns show the outputs obtained from the two proposed and supervised models in the relevant boundary conditions, respectively.
Table 2 Boundary conditions(in degrees Celsius) of the input data shown in Figure 7
number
|
left
|
right
|
Bottom
|
top
|
1
|
9
|
23
|
71
|
0
|
2
|
45
|
94
|
55
|
55
|
3
|
5
|
45
|
2
|
40
|
4
|
84
|
99
|
92
|
3
|
5
|
74
|
71
|
12
|
29
|
6
|
34
|
55
|
9
|
60
|
7
|
38
|
0
|
12
|
59
|
8
|
11
|
78
|
5
|
12
|
9
|
100
|
21
|
68.44
|
6.9
|
10
|
34
|
11
|
12
|
29
|
4.2. Comparison of changes in the percentage of the average per-pixel output error of the two proposed and supervised models during the training process
Figure 8 shows the changes in the percentage of the average per-pixel output error of the two proposed and supervised models during different periods of the training process in this experiment. The pink curve corresponds to the supervised model, and the blue curve corresponds to the proposed model. In this figure, the horizontal and vertical axes represent the training epoch and the average per-pixel output error corresponding to the relevant training epoch, respectively. This error is calculated using the equation (4). ytrue is the correct equilibrium temperature distribution matrix obtained using the finite difference method in this experiment. yprediction also represents the temperature distribution matrix predicted by the relevant method. The mean operator calculates the average values of the resulting matrix.
As for the training procedure, experience so far indicates that while training deep neural networks, it is often useful to reduce the learning rate as the training progresses [20]. In this experiment, using the Adam optimizer[21], the learning rates of the two models were set with different values in descending order from 10−3 to 10−6 during consecutive courses during the training process and is shown in Table 3.
The training of the supervised and proposed models has been done during 4000 and 2200 epoches, respectively, with batch size in size of 128 samples. Every iterations of the optimizer on a single NVIDIA Tesla K80 GPU card takes around 18 seconds, for both models. Before starting the learning process, the weights of the networks of both models were set to random with identical values. Therefore, both have started the training process with the same weights. At the beginning of the learning process, the mean absolute percentage error (MAPE) for both models is 87%. According to Figure 8, the error reduction process is performed at a very high speed in early training when the simulation error of both networks is high. However, this process gradually becomes slower by gaining more accuracy. In the initial moments of training, the simulation error curve of the supervised model has a much steeper slope than the proposed model, which shows that in early training the learning ability of the supervised model is much higher than the proposed model.
In this model, the error reduction process does not last long, so that it slows down sharply from the 200th epoch onwards. Finally, the training of this model stops after about 4000 epoches, gaining an average error of 2.19 percent per pixel. In contrast, it has lasted longer in the supervised model, although the error reduction process is relatively slow from the beginning. In this model, the error reduction process has become faster gradually with more training epochs. Until the curves of the both models intersected each other at the epoch 1800 with error rate of 4.9%. From this epoch onwards, the proposed model has fewer errors than the supervised model.
The slope of the curve related to the mentioned model slows down after about 2000 epochs, and finally this model succeeds to obtain a lower error rate with a smaller number of training epochs. The training of this model ends after 2400 training epochs with the average output error rate of 0.68% per pixel. From the analysis and comparison of the two curves, it can be infered that the learning speed of the supervised model is much higher in the early training, but then the proposed model succeeds in gaining higher accuracy with a smaller number of training epochs. This indicates the high learning capability of the proposed model in simulating the two-dimensional heat transformation. Moreover, the training epochs of the supervised model is faster than the supervised model with 1600 less training epochs.
Table 3 The learning rates of the two proposed and supervised models during different training epochs in the training process
proposed model
|
supervised model
|
epoch
|
learning rate
|
epoch
|
learning rate
|
600
|
10−3
|
300
|
10−3
|
1000
|
8 × 10−4
|
2500
|
8 × 10−4
|
200
|
5 × 10−4
|
1000
|
2 × 10−4
|
200
|
10−4
|
200
|
10−4
|
200
|
5 × 10−5
|
200
|
10−5
|
200
|
10−5
|
200
|
10−6
|
4.3. Comparing the learning process of the two proposed and supervised models during the training process
Table 4 shows the learning process of the two proposed and supervised models in steady-state heat transfer simulation during different training epochs, and compares them with each other. The initial temperature distribution of the square control volume shown in the table at the left, right, bottom and top borders of the conductive plate is 34, 55, 9 and 60 ° C, respectively. The information displayed in each column of the table shows the output of the desired model and the average per-pixel output error of the model in the relevant training epoch. Although the two models eventually produced the same output, they each adopted a different learning method to produce the desired output. According to the information in the left-hand column of the table, which describes the learning process of the supervised model, early in the training, the network tends to generate outputs that draw an overall structure of the equilibrium condition on the output by observing the steady-state heat transformation data. It then slowly adds more detail to the desired output. In this model, the fundamental changes are applied in the first training epochs and intangible changes are made to the output image from the 250 epoch onwards. The right -hand column of the table shows the output of the proposed model in each training epoch. The proposed model tends to reduce the value obtained by passing the kernel containing the equilibrium conditions pattern across the output. Therefore, it is taught by this criterion that the temperature of each point should be the average of the temperatures of the four adjacent points.
Consequently, a plate with constant temperatures also satisfies the desired conditions. Since the borders of a plate with a size of 64 × 64 form a relatively small fraction of the whole plate, early in the training, the model tries to satisfy the pattern in the kernel by producing outputs in which the entire plate is zero except for the early training. As the learning process progresses and with the aim of further reducing the amount obtained from the kernel passing across the output, the network gradually moves towards producing outputs that follow this pattern to a greater extent. Therefore, in the early training, compared to the proposed model, there is a large distance between the correct output (correct equilibrium heat distribution) and the learning process is very slow.
However, the output produced by this model is more accurate. It is due to the fact that if follows the principle of the equilibrium temperature pattern encrypted in the kernel. The supervised model draws an overall structure of the equilibrium heat distribution from the beginning and then completes it by adding more details of the equilibrium temperature conditions. However, in the proposed model, a completely opposite procedure occurs. In this model, the temperature lines are smooth from the beginning and possess details, but the desired overall structure is achieved by more training epochs. Therefore, the model has a high error rate early in the learning process, but over the time and with the formation of overall structure of the equilibrium heat distribution, the error reduction process proceeds faster than the supervised model.