3.1. Parallel storage system structure
Distributed shared parallel storage can be expanded on a large scale and can provide continuous high-availability processing bandwidth. At the same time, the architecture and file system are more open, which is conducive to user management and secondary development. As a result, most modern supercomputing centers use a distributed shared parallel storage architecture. The block diagram is shown in Fig. 1.
Considering the structural characteristics of massively parallel storage, an ideal parallel file system model was first created. The ideal file system modeling is largely based on the architecture, as shown in Table 1:
Table 1
The meaning of each parameter
Parameter
|
Meaning
|
Parameter
|
Meaning
|
Ci
|
A computing node
|
Wn
|
Network bandwidth
|
C
|
Compute node collection
|
ΔTm
|
Delay in processing an I0 request by the metadata server
|
M
|
Metadata server
|
Ws
|
Maximum I0 bandwidth of the data server
|
Sj
|
A data server
|
Vi
|
Calculate the file size requested by a certain I0 of the node
|
L
|
Number of compute nodes
|
K
|
Number of data servers
|
S
|
Data server collection
|
Oj
|
Correlation conflicts caused by multiple IO requests
|
|
The access size is V, the number of IO requests generated
|
D
|
The size of the data requested by each I0
|
PJ
|
I0 performance when there are j conflicts (the larger the I0 performance, the worse)
|
Tm
|
Access of size V, time overhead required by the metadata server
|
Tn
|
Data delay through the network
|
Ts
|
Data server read and write time overhead
|
ηc
|
Cache hit rate of data in computing node
|
η0
|
The cache hit rate of the data on the data service
|
Dc
|
Delay overhead for a client i to access data
|
Wc
|
Bandwidth of a client i to access data
|
Use an ideal mathematical model to describe the process of the client accessing the data server in the parallel storage system, as shown in Fig. 2.
If L computing nodes access the metadata server at the same time, the data server waiting time of the computing node Ci can be expressed as formula (1):
$${T}_{M}\left({V}_{i}\right)=\rho \left(L\right)\phi \left(\sum {V}_{i}\right)\varDelta {t}_{m}$$
1
And assuming that there is not much difference in the cache hit rate, formula (2) and formula (3) can be obtained:
$${\eta }_{{C}_{I}}={\eta }_{{C}_{J}}={\eta }_{{C}_{O}}$$
2
$${\eta }_{{O}_{I}}={\eta }_{{O}_{J}}={\eta }_{{O}_{O}}$$
3
The parameters of the hard disk performance degradation caused by the data access conflict on the medium arriving at the data server, such as formula (4):
$$Q=L\times \left(1-{\eta }_{C}\right)\times \left(1-{\eta }_{0}\right)$$
4
Since the original data server still has a large amount of post-discovery locking overhead, the busy time is very long, so the metadata conflict overhead is as follows: formula (5), formula (6), formula (7), formula (8), formula (9):
$$Q=L\times \left(1-{\eta }_{C}\right)$$
5
$${T}_{N}=\frac{{V}_{i}\times \left(1-{\eta }_{{C}_{i}}\right)}{{W}_{N}}$$
7
$${T}_{X}=\frac{{V}_{i}\times \left(1-{\eta }_{{C}_{i}}\right)\left(1-{\eta }_{0}\right)}{{K}_{i}{W}_{S}}$$
8
$${D}_{Ci}={V}_{i}\left(1-{\eta }_{Ci}\right)\left(\frac{\rho \varDelta {t}_{m}}{D}+\frac{1}{{W}_{N}}+\frac{1-{\eta }_{oi}}{{K}_{i}{W}_{S}}\right)$$
9
Since the network is non-blocking, the average network delay is formula (10):
$$\stackrel{\_\_\_}{{T}_{N}}=\frac{\sum \left({V}_{i}\times \left(1-{\eta }_{Ci}\right)\right)}{L\times {W}_{N}}=\left(1-{\eta }_{C}\right)\times \frac{\stackrel{\_\_}{V}}{{W}_{N}}$$
10
After the storage system was built, the storage system was tested. The configuration of the system structure is shown in Table 2.
Table 2
Equipment name
|
Quantity
|
RAM
|
Description
|
Storage
|
IO Service Acceleration Node (ION)
|
512
|
8X16G
|
Dual-channel high-energy Intel Xeon CPUE5-2692V2, 12 cores, 2.2GHz; 1 self-developed high-speed interconnection network interface, 1 IB interface (the former is connected to the computing node, the latter is connected to the storage network) 2 ITB capacity PCIE solid state memory cards .
|
2T solid state storage
|
Client (computing node)
|
19840
|
12X16G
|
Dual-channel high-performance Intel Xeon CPU E5-2692V2, 12 cores, 2.2GHz; 1 self-developed high-speed interconnection network interface, and an IB interface.
|
0
|
Self-developed network
|
The measured bandwidth of a single link is 12.6GBps.
|
The client starts reading and writing to an ION, and the corresponding number of computing nodes ranges from 1 to 128, with one process per node. The test results are shown in Fig. 3.
Figure 3 shows that the read and write bandwidth shows a linear growth trend. Therefore, if the number of tested clients exceeds 32, the throughput begins to decrease. By default, every 64 clients are counted. If the user program needs more I/O bandwidth, the administrator can increase the I/O by changing the HVN. Due to the insufficient number of computing nodes in Test 64, some computing nodes have run two tests. The test results are shown in Fig. 4:
The test environment is shown in Table 3.
Table 3
Equipment name
|
Quantity
|
RAM
|
Description
|
Storage
|
IO Service Acceleration Node (ION)
|
512
|
8X16G
|
Dual-channel high-energy Intel Xeon CPU E5-2692V2, 12 cores, 2.2GHz; 1 self-developed high-speed interconnection network interface, and an IB interface. (The former is connected to the computing node, and the latter is connected to the storage network) 2 ITB capacity PCIE solid state storage cards.
|
2T solid state storage
|
Client (computing node)
|
19840
|
12X16G
|
Dual-channel high-performance Intel Xeon CPU E5-2692V2, 12 cores, 2.2GHz; 1 self-developed high-speed interconnection network interface.
|
0
|
Storage server
|
128
|
12X16G
|
Dual-channel high-performance Intel Xeon CPU E5-2692V2, 12 cores, 2.2GHz; 1 self-developed high-speed interconnection network interface, and an IB interface.
|
A total of 64 IB-SANs, each with a storage capacity of 64*4T
|
Self-developed network
|
The measured bandwidth of a single link is 12.6GBps.
|
3.2. Technical basis of Android voice assistant
The main process of the dual-threshold endpoint detection algorithm is: after reaching a certain intensity of short-term average energy and crossing the short-term zero frequency, first set a low threshold for the weaker signal, and then set a high threshold for the signal. When the average energy and the zero-crossing frequency both return below the lower threshold, the end position is noted, and the energy or zero-crossing speed of the subsequent frame signal is also continuously measured. If the values of the two parameters increase and exceed the upper threshold, it can be determined that the effective voice signal has not ended, and the currently marked end point has been deleted, but the two test indicators have not been exceeded. Since there can be multiple characters in the control word, the audio signal repeats the process of starting, ending, and restarting. It is necessary to find the start point of the first mark and the end point of the last mark to determine the exact length of the control word. The block diagram of the double threshold detection algorithm is shown in Fig. 5.
Linear activation function, such as formula (11):
$${y}_{k}=f\left({v}_{k}\right)={K}_{vk}$$
11
Threshold logic activation function, such as formula (12):
$${y}_{k}=f\left({v}_{k}\right)=\left\{\begin{array}{c}1\\ 0\end{array}\right.\begin{array}{c}{v}_{k}\ge {\theta }_{k}\\ {v}_{k}<{\theta }_{k}\end{array}$$
12
S-type logic activation function, such as formula (13)
$${y}_{k}=f\left({v}_{k}\right)=\left\{\begin{array}{c}1\\ 0\end{array}\right.\begin{array}{c}{v}_{k}\ge {\theta }_{k}\\ {v}_{k}<{\theta }_{k}\end{array}$$
13
Hyperbolic tangent activation function, such as formula (14):
$${y}_{k}=f\left({v}_{k}\right)=th\left({u}_{k}\right)$$
14
The output data of the neurons in the output layer, as in formula (15):
$${y}_{j}={f}_{2}\left({\sum }_{k=1}^{q}{w}_{jkZk}\right)$$
15
The error Ep of the p-th sample will be Eq. (16):
$${E}_{p}=\frac{1}{2}{\sum }_{j=1}^{m}{\left({t}_{pj}-{y}_{pj}\right)}^{2}$$
16
For all examples, the global error is formula (17):
$$E=\frac{1}{2}{\sum }_{p=1}^{p}{\sum }_{j=1}^{m}\eta \left({t}_{pj}-{y}_{pj}\right)={\sum }_{p=1}^{p}{E}_{p}$$
17
Define the error signal, such as formula (18):
$${\delta }_{xj}=-\frac{\partial {E}_{p}}{\partial {S}_{j}}=-\frac{\partial {E}_{p}}{\partial {y}_{j}}\cdot \frac{\partial {y}_{j}}{\partial {S}_{j}}$$
18