To overcome challenges of some conventional generative models outlined earlier, a novel architecture that avoids duplication of resources, such as neurons and synapses in generative neural network models, such as autoencoders [7,23], is proposed. The advantages of the proposed solution are that it is compatible with the requirements of biological feasibility, generality and evolvability while, as discussed further in this paper, being capable of successful learning concepts in the data of minimal, but realistic conceptual content.
2.1 Bi-Directional Generative Architecture
The proposed network architecture is based on bi-directional synapses and a two-phase training cycle.
A bi-directional synapse differs from a neural synapse in the conventional feed-forward neural network architecture by its ability to operate in both directions, having two sets of trainable parameters, “forward” and “back”: Wf = {wabf, bf}, Wb = {wabb, bb}.
Use of bi-directional synapses allows to avoid the redundancy problems of conventional generative models such as autoencoders, by reducing the resources, both neurons and synapses effectively by close to a half, while fully retaining universal approximation capacity of these models.
2.2 Training
In the training phase the model effectively “unfolds” as with subsequent training equivalent to conventional generative architectures such as autoencoder. In the first, forward pass through the network the forward parameters are used first in the direction Wi,f to Wl,f, then backward ones are used in the reverse direction, i.e. from the latent segment to the input, Wl,b to Wi,b resulting in generated output:
$$y= {{T}_{b} \times T}_{f}\left(x\right)$$
1
where y is the output generated by the model from observable input x, Tf and Tb, tensors obtained with forward and backward sets of model parameters as described in Section 2.1, respectively.
In the second, learning phase, the parameters are adjusted based on the deviation of the output from the input, defined by the cost function, in the opposite direction, i.e. Wi,b and to Wi,f. Any type of Bayesian method can be used, including gradient descent, or its biologically feasible implementations.
By definition, training of a bi-directional model of this type would be equivalent to that of a dual conventional symmetrical model of an approximately double size (excluding the latent component) with parameters { Wi,f .. Wl,f, Wl,b .. Wi,b } in the sense that under the same conditions of training it would produce the same configuration of training parameters, and consequently, distributions of data in the latent representation component of the model. In the simplest case, as in the examples discussed further, it can be a single encoding layer producing latent representation described by the coordinates of neuron activations; in other cases, it can be a set of layers with a more complex structure of the latent representation.
In the training phase a model can be trained in an unsupervised generative process with a subset of sensory data that can be sampled randomly or via an enhanced process of selection of training samples. In the operation phase of a trained model, the encoding and generative parts of model described by tensors T(f) and T(b) respectively, are effectively “disconnected” producing the encoding transformation E from the observable space to latent representation, and the generative one, G, operating in the opposite direction:
$$r={E\left(X\right)= T}^{\left(f\right)}\left(X\right);y={G\left(l\right)=T}^{\left(g\right)}\left(l\right)$$
2
where X, an observable sample; r, the latent image of X; and y, the observable interpretation of a latent position l generated by the model.
2.3 Learning and Generative Ability
The statement of equivalence of training of a bi-directional generative model and the dual symmetric autoencoder-type model will be used extensively throughout this work given the significant number of results obtained with such models [13–20,24].
With regards to training and learning success, as has been discussed in a number of earlier results [7,20], a success of generative learning can be verified with unsupervised methods that do not require prior knowledge about conceptual content in the sensory data, such as:
-
monitoring the values of training parameters such as cost function, during the training process;
-
correlation measures of the input and generated output samples during and resulting from training;
-
verification of generative ability of trained models via generating a subset of samples of the types represented in the training set.
2.4 Experimental Datasets
In evolvable learning approach, the objective differs from conventional methods as what is sought is not s specific and highly specialized, perhaps very complex architectural solutions as a number of recent neural architectures [25,26] that proved to be successful in analyzing real-world data of high complexity; but rather establishing possible pathways in which models of plain architecture and limited complexity would be able to evolve in success of analysis and representation of data of increasing complexity.
To follow this objective, we start with generic models or limited complexity, and data of limited, though still realistic in some simple environments, conceptual content. Two essential objectives of this study were to demonstrate that such standard “vanilla” models are capable of successful self-learning with simple conceptual data; and then, demonstrate their ability to evolve toward successful learning of more complex data in an incremental way that does not require massive addition of resources or architectural modifications.
In following this program, several datasets of images of basic geometric shapes such as: circles, triangles and backgrounds with differences in content and complexity, measured by variety of content were created. While images represented simple shapes, the intent was for the data to have certain realistic context for simple learning systems, for example, different types of shapes can be associated with sources of food versus predators and general background in some simple natural environments.
The first dataset of grayscale images, Shapes-1 (G1), consisted of 600 images of circles, triangles and grayscale backgrounds with two representative samples per shape with variation in the size and contrast of fore / background.
The second dataset, Shapes-2 (G2), contained 1,000 grayscale images of circles, triangles and backgrounds with variation of the size in the range 0.3–1.0 of the image size (i.e., 0.3 × 64 pixels), with variation of contrast of fore- vs. background for each size.
The third dataset, Shapes-C (C) contained 1,200 color images of circles and other shapes as described in Table 2. of two colors, red and blue, of different size and contrast to the background; triangles of two colors with the same characteristics; horizontal stripes of types: wide red and narrow blue; and empty grayscale backgrounds.
In artificially generated datasets the images were centered, symmetrical and had no rotation based on the argument [27] that a separate orientation function could be effective in producing sufficient quality of observations without significant cost associated with neural networks of higher depth and complexity.
Finally, as an example of real-world image data the MNIST dataset of handwritten digits [28] was used.
The range of datasets used in the study allowed us to evaluate learning ability of the models in the learning success, generative quality and the ability to generalize. One can introduce a measure of conceptual complexity of the observable data as the number of characteristic patterns that at least in some cases can be identified without prior knowledge about the content and semantics of the data. The difference in conceptual complexity of the image data in the datasets allowed us to make a number of observations on learning success of generative models with respect to data of increasing complexity.
2.5 Generic Generative Architecture
An essential expectation for a biologically-feasible architecture is evolvability, that is, a possibility of an incremental change or sequence of changes from simpler to more complex architecture associated with improved learning capacity.
A generative neural network architecture can be described by three essential components, performing the functions of physical adaptation or rendering; deep processing; and latent representation, or R-D-L.
In the rendering stage the observed sensory data is transformed to an invariant numerical representation. This stage is specific to the data and sensory mechanisms, for example, light sensitive elements for visual data, auditory or olfactory neurons and similar for other senses, producing output in the format of a numeric vector that is fed into the next stage of processing.
The deep stage of a standard architecture consists of a number of deep layers with possible additional features such as sparsity constraint, residual layers and so on. This stage represents the brawn of the model, allowing to produce features at different scales with effective description of the observed data.
In the final, representation stage, the output of the deep stage is compressed to produce effective low-dimensional representation of the data, i.e. a small set of effective features that can be related to activations of neurons in the representation block of the model. The representation stage can be a single layer in simplest models, or a more complex combination of layers.
In following the objectives of the study, generic models of minimal complexity were used in this work. For visual data, rendering phase (R) consisted of a number of convolutional – pooling layers to capture features of higher scale. The deep stage consisted of a single interconnected layer of a constant size N (DN, e.g. D30). Finally, the latent phase was represented by a single layer of a constant size M (LM, e.g. L10) with the latent coordinates in the representation space defined by activations of the neurons in the latent layer, { l1, .. lM }.
The conceptual complexity of the data represented in datasets varied from C = 3, datasets Shapes-1,2 to well over 10 and possibly, up to 100 of the MNIST dataset of handwritten numbers.
The architecture and parameters of models used in the study are shown in Table 3.
Table 1
Minimal generative architecture
Architecture | Data | Rendering | Depth | Latent |
Minimal | Grayscale shapes, Shapes-1,2 | Convolution, 2–3 stages | 1 layer (D50 − 100) | 1 flat layer, L3 − 10 |
Incremental 1 | Color shapes, Shapes-C | - | - | 1 sparse layer, L5 − 10S, l1 = 10− 5 |
Incremental 2 | Handwritten digits, MNIST | - | - | 1 sparse layer, L10 − 24S, l1 = 10− 4..−5 |
Generative neural models used in the study were implemented in Keras / Tensorflow [29], several common data analysis and machine learning packages and libraries were used.