Multichannel Matrix Randomized Autoencoder

The existing randomized autoencoders are generally designed for vectorization data resulting in destroying the original structure information inevitably when dealing with multi-dimension data such as image and video. To address this issue, a one-side matrix randomized AE (OMRAE) is developed that takes the two-dimensional (2D) data as inputs directly by the linear mapping on one-side of inputs with matrix multiplication. For multichannel 2D (M2D) data, a multichannel OMRAE (OMMRAE) is proposed by training the output weights to rebuild each channel of inputs respectively. In this way, the structural information of each channel and the interaction between channels are explored. Then, a double-side structure using 2 OMMRAEs to simultaneously extracts the row and column structure information of M2D is developed. At last, a novel hierarchical matrix randomized neural networks is constructed for one-class classification where each layer passes information by bilinear mapping derived from DMMRAE. Experiments are conducted on 2 benchmark datasets for the effectiveness demonstration. Comparisons to several state-of-the-art AEs reveal that the proposed OMMRAE/DMMRAE can obtain better performance with a compact network size. The source code would be available at https://github.com/ML-HDU/MMRAE.

To address the aforementioned issue, a one-side matrix randomized AE (OMRAE) is developed that takes the 2D data as inputs directly by the linear mapping on one-side of inputs with matrix multiplication.Then, a novel multichannel OMRAE (OMMRAE) is proposed for M2D data by training the output weights to rebuild each channel of inputs respectively.At last, a double-size multichannel MRAE (DMMRAE) is proposed by parallelly using two OMMRAEs to extract the row and column structure information of M2D.Compared with the existing vectorization based RAEs, the proposed OMMRAE/DMMRAE can reserve the structure information of each channel and meanwhile the interactions between channels are explored.Considering the successes of the RAEs on one-class classification (OCC) applications [13,14,17,18], in this paper, a novel hierarchical matrix randomized neural networks is constructed for OCC (HMRNN-OC) where each layer passes information by bilinear mapping derived from DMMRAE.Experiments are conducted on 2 benchmark datasets for the effectiveness demonstration.Comparisons to several state-of-the-art AEs reveal that the proposed OMMRAE/DMMRAE can obtain better performance with a compact network size.The contributions of the paper are summarized as follows.
1.A novel OMMRAE is proposed for M2D data feature learning by multichannel interaction mechanism and thus the structural information of each channel and the interaction between channels are explored.2. A DMMRAE by using two OMMRAEs parallelly on both side is further developed to address the issue of OMMRAE that only performs feature learning with a unilateral projection scheme.3. A HMRNN-OC framework built in stacking DMMRAEs is developed for OCC, and the experimental results demonstrate the effectiveness of the proposed algorithms compared with several state-of-the-art AE algorithms and OCC algorithms on benchmark datasets.

Related Work
The AE is a direct and efficient feature extraction method and has been widely developed for representation learning in deep neural networks [19][20][21][22][23][24].The original AE is proposed by reconstructing the inputs with the least distortion and thus the useful representations can be extracted [20].Many variants of the original AE has been developed such as the sparse AE [20], the denoising AE [25], the contractive AE [26], the Laplacian AE [27], the variational AE [28], and so on.The recent convolutional AE (CoAE) [29] is the popular method due to its application in image and video data.However, the aforementioned AEs usually need a long training time due to the slow convergence of the gradient descent based back-propagation (BP) methods.
The RNN-AE has attracted much attention due to its fast learning speed, ease of implementation and less human-intervention.The early RNN-AE can be tracked to [5] that uses random hidden-layer parameters without tuning and only trains the output weight for representation learning.RNN-AE shows superior performance to many other methods on generalization capacity and training speed.Subsequently, the 1 -norm penalty based sparse RNN-AE (RNN-SAE) [6], the kernel RNN-AE (RNN-KAE) [7] using kernel function, the graph RNN-AE (GRNN-AE) [8,9] by regularizing the graph Laplacian manifold, etc, are continually developed.Specially, Yang et al. [10] proposes a generalized RNN-AE that reconstructs not only the original input itself, but also the adjacent points exploiting the weights built on the graph Laplacian.The recent improvements of RAE focus on imposing constraints on the encoded features to obtain the desired feature distribution [12][13][14].The aforementioned RAEs are generally designed for 1D data resulting in destroying the original structure information inevitably when dealing with M2D data such as image and video.The CoAE can learn effective representations directly from the M2D data but there is still a lack of effective representation learning methods in RAE community.

OMRAE
For 2D data such as grey-scale images and time-frequency spectrograms, the OMRAE utilizes the rule of matrix multiplication to conduct linear mapping directly on the one-side of original 2D inputs.Give the 2D dataset where D 1 and D 2 are the dimensions of the 2D data.OMRAE first randomly generates the input weight W ∈ R L×D 1 and bias B ∈ R L×D 2 where L is the number of the hidden neurons, and then the matrix projection is conducted on 2D input X i and the hidden-layer output of i-th sample is derived as with Here β ∈ R L×D 1 is the output weight to be optimized, and it can be analytically derived by setting the derivative to 0 as The encoded output of OMRAE can be obtained by It is worthy pointing out that the matrix multiplication applied to the 2D inputs seems to be equivalent to linear transformation on vectorized data, but they are different in essence.

Dividing the 2D input by column
and the output weight β of ( 4) by row, the encoded output is then derived as Here b i ∈ R 1×D 1 .In comparisons, the 2D input X is vectorized as Then, the output weight and the encoded output are obtained by the vectorization based RAE as Here bi ∈ R 1×(D 1 •D 2 ) .It can be readily seen from ( 5) and ( 7) that the OMRAE retains the column structure information and conducts the feature learning on each column vector but the vectorizaton based RAE destroys the column structure.In addition, the output-weight parameters of β is D 2 times as many as β of OMRAE.In other words, the better features can be extracted by the OMRAE with less learnable parameters than vectorization based RAEs.

OMMRAE
The effective feature representation can be obtained by OMRAE from original 2D data directly, but it fails to deal with M2D data.For M2D data, the OMRAE is extended to OMMRAE by rebuilding all channels simultaneously from hidden-layer outputs.Give the M2D dataset , where K is the number of channels and OMMRAE randomly generates the K input weights and biases (W (k) , B (k) ), k = 1, • • • , K for each channel, and the hidden-layer output of OMMRAE is obtained by It can be seen that the H i integrates all information and the proposed OMMRAE tries to reconstruct all channels of the original M2D input from the integrated H i .It is believed that the obtained output weights can learn more intrinsic features from M2D data.Specially, the loss function can be expressed as min Similarly, the analytical weight can be obtained as The encoded output of i-th sample is As shown in (11), OMMRAE trains the output weight β (k) by rebuilding each channel of M2D inputs, thus fully mining the structural information of each channel.Fig. 1shows the detailed architecture of OMMRAE and Algorithm 1 summarizes the pseudo code of OMMRAE.

DMMRAE
Both the OMRAE and OMMRAE are constructed only by a unilateral transformation, and the encoded outputs are also obtained by the unilateral linear mapping.In this way, OMMRAE performs feature learning on the column vectors of the M2D data but omits the correlation information among the row vectors.The disadvantages are arising as more hidden neurons are needed for feature representation, leading to a high computation complexity.To remedy this

Algorithm 1 OMMRAE
Given: Steps: i by dividing the M2D data by channels, as shown in Fig. 1a. 2. Randomly generate (W (k) , B (k) ) and compute the hidden-layer output of each channel by (8), as show in Fig. 1 b. 3. Compute the integrated hidden-layer output by (9), as show in Fig. 1c.

Calculate
5. Obtain the output weight β (k) by (11) and derive the encoded output Y i .
drawback, two OMMRAEs are conducted in parallel to perform feature learning on the row and column vectors simultaneously, resulting the DMMRAE.Algorithm 2 summarizes the pseudo code of DMMRAE.It is worthy pointing that the DMMRAE is actually an ensemble strategy training 2 OMMRAEs with inputs , respectively.The resulting output weights β From the above derivation, it can be found that the proposed OMMRAEs can effectively reduce the dimension of the output weight, which is critical to real implementations with a compact network size.For example, for a dataset X ∈ R D 1 ×D 2 , in feature learning and optimization, the dimensions of the output weight of OMMRAEs is only L × D 1 or L × D 2 while the dimensions of the output weight of the conventional vectorization based AE is up to L × D 1 × D 2 .

Given:
M2D data X i , the number of hidden-layer neurons L, the regularization parameter C, the activation function g (•).Steps:

HMRNN-OC
The proposed DMMRAEs are embedded into the HMRNN-OC framework for performance evaluation.The HMRNN-OC consists of the feature learning stage and the OCC stage.The feature learning stage is constructed by stacking several DMMRAEs and thus the learning is the greedy layer-wise training.Assuming that J DMMRAEs are stacked, the output weights of the j-th DMMRAE β (k) (l) j and β (k) (r ) j are derived by Algorithm 2. The encoded output is computed by (12) and then fed to the activation function.Denoting by Y i0 = X i , the j-th hidden-layer output of the i-th sample can be obtained by It is natural to extend our proposed methods to solve OCC problems for performance demonstration due to that RAEs achieved encouraging performance on one-class classification applications in the past [14,18,30].Thus, the derived hidden-layer output Y i J by J DMMRAEs is then used for OCC.The last OCC loss function of HMRNN-OC is [14] min where t is a constant, cs(•) is the column expansion operator, i.e., the matrix is expanded by columns.The solution can be then derived as Here . At last, the OCC decision is conducted by thresholding following [14] that the threshold η by rejecting a percentage of training samples as outliers.The more details can be found in the pseudo code of Algorithm 3.

Testing stage:
A testing sample X p 1. for j = 1 : J ,

Experiments
Experiments on 2 common benchmark image datasets, COIL100 and CIFAR10, are conducted for effectiveness evaluation.Particularly, for CIFAR10, one category is chosen as the target and the rest are considered as outliers in OCC.For COIL100, we combined several classes with similar object shapes together to formulate the OCC problem.Each target class is indexed by a number following the "-" symbol.For example, CIFAR10-1 means that the 1-th class of the CIFAR10 is assigned as the target classes.All categories will be traversed by the same rule in the experiments.The specifications of all datasets are given below (samples are visualized in Fig. 2) 1. COIL1001 is a color image database with 100 objects.Each object is captured in 72 different positions by rotating the object position.The size of each image is 128×128, and the dataset contains 5000 training and 2200 test samples.Due to the small number of samples in each class, we combined several classes with similar object shapes together to formulate the OCC problem.For example, we group round jars and medicine bottles together because they both look like cylinders.2. CIFAR102 is an object recognition dataset including 50000 training samples and 10000 testing samples from 4 vehicle and 6 animal classes, and each sample is 32 × 32 color images consists of images from 10 classes.Out of the considered datasets, CIFAR10 is the most challenging dataset due to it diverse content and complexity.Specifically, it should be noted that all other datasets are very well aligned, without a background.In comparison, CIFAR10 is not an aligned dataset and it contains objects of the given class across very different settings.As a result, one-class novelty detection results for this dataset are comparatively weaker for all methods.Out of the baseline methods, [21] has done considerably better than other methods.

Sensitive Analysis of Hyper-Parameters
Figure 3 shows the parameter sensitivities of OMMRAE and RNN-AE on the number of hidden layer neurons and regularization parameters (L, C).As can be seen, the proposed MMAREs are generally less sensitivity to the number of hidden layer neurons and the regularization parameters than RNN-AE.For OMMRAE, the overall performance becomes stable and convincing when L ≥ 100.We use the HLS-OC algorithm embedded with a MMRAE to conduct optimization respectively in the parameter grid established above.high accuracy in many categories.Among all 15 testing cases, OMMRAE and DMMRAE wins the highest AUC on 10 cases, while RNN-SMA, RNN-AE, and RNN-SAE only offer the best results on 2, 1, and 2 cases, respectively.Besides, the comparison between OMMRAE and DMMRAE shows that DMMRAE performs overwhelmingly better than OMMRAE, as adopting double-side MMRAE wins the highest AUC on 9 cases.

Compared with Typical AE Algorithms
In Fig. 4 , the ROC curves are also depicted to visually demonstrate the advantages of the proposed matrix AEs in feature learning.It is clearly observed that: (1) in Fig. 4a and b, our proposed MMRAEs generally achieve the best performance among all compared AEs, (2) among our proposed MMRAEs, adopting the double-side feature learning (namely DMMRAE) performs normally better than the single-side feature learning (namely OMMRAE).
Network complexity is an important factor for real implementation.Comparing with conventional vector/scalar AEs, a key merit of MMRAEs is the compact network structure to achieve a comparable or better performance than SOTA AEs.Fig. 5 shows the comparison of used number of hidden nodes to achieve the best performance for our proposed OMMRAE/DMMRAE and the comparesed SOTA AEs.The comparisons are presented on the experiments of 3 category data from CIFAR10 and COIL100, respectively.As clearly depicted, for both CIFAR10 and COIL100 datasets, to achieve the best performance, the needed hidden nodes by our proposed OMMRAE/DMMRAE is generally far less than

Fig. 1
Fig. 1 The architecture of the OMMRAE.a Divide the M2D data by channels.b Compute the hidden-layer output of each channel.c Compute the integrated hidden-layer output.d Reconstruct the M2D data by the output weights L×D 2 are used to obtain the encoded outputs by multiplying X (k) i left and right, respectively

2 . 3 . 4 .
Train the OMMRAE by Algorithm 1 with the inputs X i and obtain the left-hand side encoded weight β Train the OMMRAE by Algorithm 1 with the Xi and obtain the right-hand side encoded weight β Derive the encoded outputs by (12).
the number of the stacked DMMRAEs J , C j , L j , j = 1, • • • , J , Training stage: 1.For j = 1 : J (a) Compute β (k) (l) j and β (k) (r ) j by Algorithm 2. (b) Compute the hidden-layer output y i j by (13) 2. Calculate the output weight by (15).3. Derive training errors by ε(X i ) = cs(y i J ) T β − t .4. Select the threshold η by rejecting a percentage of training samples as outliers.

Fig. 3
Fig. 3 Visualization of the parameters optimization on the COIL100 and CIFAR10 datasets, obtained by grid optimization in OMMRAE (left) and RNN-AE (right)

Fig. 4 Fig. 5
Fig. 4 ROC Comparisons with SOTA AEs Table 1compares the AUC obtained by the 4 SOTA AEs between the proposed OMMRAE and DMMRAE algorithms.All methods have been applied to the HLS-OC framework with a single AE for feature learning.The best results are marked in bold in the table.As highlighted, the proposed matrix AEs can effectively improve the classification performance and have a