Dynamic Response Prediction of Bi-State Emission of Quantum Dot Lasers Based on Machine Learning

Dual-state emission is a common and important phenomenon which takes place in semiconductor Quantum Dot Lasers at different temperature and operating conditions usually investigated from microscopic carrier interaction modeling or even rate-equations based approaches. In this study, we revisit the topic, but the investigation is here performed from a system identification perspective; we built black-box models based on artificial neural networks approach, using the Multilayer Perceptron, the Extreme Learning Machine and a hybrid Echo State Network Extreme Learning Machine. As a case study, we focused on switch-on transient and its prediction. The study revealed the model was able to separate and to predict, from the solely total power, without using any QDL design parameters, the optical power around the ground state and first excited state lasing lines of InAs/InGaAs quantum dot laser. The error performance was low as a RMSE of 2.81 μW and MAPE of 0.50% with processing time (training and testing time) of 15.27 s, enabling the alternative model to be used in optical filtering instrumentation as low-resolution and low-cost filters for applications in which it is not economically viable to use a spectrum analyzer, which can be replaced by a simple optical power meter.


Quantum Dot Semiconductor Lasers
Semiconductor Quantum Dot is a nanometer-sized structure able to confine charge carriers in the three spatial directions, which is known to affect its optical emission properties [2] and, as such, can be used for engineering of emitting or light detection devices. For example, Quantum Dot Lasers rely on the use of quantum dots in the active region and their intrinsic high differential gain to produce coherent light through stimulated photon emission following the recombination of electron-hole pairs confined therein.
In the literature, much of the comprehension of the underlying physics of quantum dot devices can be achieved with the help of a simplified description of carrier scattering events in which they taking place at a regular and periodical time-basis. Figure 1 illustrates that. It shows the QDL carrier dynamics as a very simple three-level system. The first level, the Carrier Reservoir (RS), is where the carriers are injected, followed by a capture process into the ES (second level) (at time τ RS ES ) and relax into the GS (third level) (at time τ ES GS ). However, carriers also can be excited (thermally), returning from GS to ES (at time τ GS ES ), and from ES to RS (at time τ ES RS ). Additionally, carriers can recombine spontaneously in all three levels (at time τ spon RS ,τ spon ES and τ spon GS ) [2]. Simplified diagrams like the Figure 1 are essential to produce nonlinear coupled rate-equations models. Although these are a popular choice among researchers in the QDL scientific community, they are complex in the number of equations and, for the purpose of proper time-domain response prediction, they usually require sophisticate description of the underlying carrier-carrier scattering phenomena and the photon-matter interaction, which are expressed as: where I is the injected current in RS (Fig. 1), q is the elementary charge and ρ GS and ρ ES are the carrier occupation probabilities in ES and GS, respectively. ρ p is the photon lifetime; γ p is the optical confinement factor; β sp is the spontaneous emission factor and υ g is the group velocity of the light. α GS and α ES are the carrier population self-contributions in the GS and ES to the linewidth broadening factor (LBF) [2] of GS and ES, respectively. k RS ES and k GS ES are the carrier population cross-contributions in the RS and GS to the LBF of ES, respectively. k RS GS and k ES GS are the carrier population cross-contributions in the RS and ES to the LBF of GS, respectively. g GS and g ES are material gains of GS and ES, respectively, given by [2]: where α ES and α GS are the differential gains; ζ ES and ζ GS are the gain compression factors of ES and GS, respectively; N B is the QDL total number; V B is the active region volume; and V S is the volume occupied by the photons in the laser cavity and g RS is the RS gain. F RS , F ES and F GS are the carrier noise sources in RS, ES and GS, respectively. F S ES and F S GS are the spontaneous emission photon noise sources and F φ ES and F φ GS are the spontaneous emission phase noise sources in ES and GS, respectively [2]. So, using rate equation modeling for the purpose discussed would be to use a sledgehammer to crack a nut. As an alternative, we consider the QDL and its intrinsic nonlinear behavior from an input-output pictorial representation, thus encompassing the dynamics from a nonlinear mathematical model able to gather the way this very rich and intricate dynamics appears at the output for different operating conditions. Such a modeling approach is the subject of the upcoming sections.

Identification and Modeling of Nonlinear Dynamics System
Modeling and identifying nonlinear dynamic systems is a challenging task because nonlinear processes do not share many properties with each other, each system has its unique characteristics, making the ability to describe the different classes of system a universal problem [12]. As a simple example of the identification and modeling systems task, the Fig. 2 illustrates a process, assuming that has a Single Input and a Single Output (SISO), with µ(k) being the system and model input value at time instant k; n(k) is the model noise; y(k) is the process output and y ′ (k) the model output, also known as system output estimate. Estimate and real system output are compared and generate the error e(k), which feeds the model for parameter adjustments and increased reliability. There are several ways to classify modeling techniques, one of which is called black-box modeling (or empirical modeling), which uses little or no prior knowledge of the system. So, it is not necessary for the user to have knowledge about the physical laws, the complicated functions and variables, that govern the behavior of the system [8].
The Figure 3 shows the flowchart of the main steps for the black-box modeling of the system. It starts by determining the model input, and then the boot signal of the system, which is the injected current in the case of the laser here investigated. The model architecture is then chosen according to the limitations and objectives of the study. After that, the design takes place with an usual mathematical representation of dynamics systems, in which simplest models are preferred over higher order ones, avoiding the unnecessary increase in complexity and estimation of unknown parameters; It ends with a validation procedure, in which the model performance is assessed and it is analyzed whether it meets all expectations or if additional refinement is required.

Multilayer Perceptron (MLP)
One approach for estimating the nonlinearity of the mathematical modeling of dynamic systems is to use an ANN. ANNs are composed of interconnected layers of processing units called neurons, where the output of a single neuron (y) with n inputs (x) can be defined by: where b is the bias, w j is the weight, both are constant, and f () is called an activation function.
In the topology called MLP (Multilayer Perceptron), the output of one neuron feeds the input of all others neurons belonging to the next layer. Therefore, the output of a network with a single node in the output layer and a hidden layer is a non-linear function in the parameters of the type: where f s() is the neuron activation function of the output layer for a single hidden layer feedfoward architecture (Single Hidden Layer Feedforward Neural Networks -SLFNs). Such a function need not be equal to f i(), i = 1, ..., m, which in turn, need not be equal to each other. The parameter b s is the polarization term of the neuron of the output layer, w i are the weights of the output of each neuron in the hidden layer and w ij are the weights of the input j as regarded by the i − th neuron of the hidden layer. It is worth mentioning that a MLP must undergo a training phase to define the ideal synaptic weights, thus favoring the modeling itself. A popularly used method for that is the error backpropagation algorithm [5], which basically has two phases: the first concerns the synaptic weights of the network are randomly initialized, and the input propagates throughout the network, layer by layer, up to the exit. At this stage, changes are confined to the potential activation and outputs of neurons in the network.
In the second step of the training, it occurs a backpropagation of the output, and an error signal is produced and compared with the desired network output. The resulting error signal is propagated through the network, again layer by layer, but this time the propagation is performed backwards, resulting in successive adjustments in the ANN synaptic weights [5].
Other examples of training functions that can be applied in the adjustment of synaptic weights and used in the current work, are Levenberg-Marquardt [13], Bayesian Regularization [14] and Scaled Conjugate Gradient [15].

Extreme Learning Machine (ELM)
Another ANN that can learn the in-out nonlinear mapping function is the ELM, which is a learning technique for training SLFNs. Compared to other ANN techniques, the main difference is the significant increase in training speed through random generation of input weights and the bias of the hidden layer [6]. Essentially, the nonlinear relation between input and output is written according the following equations: β ∈ R Lxm min Hβ−T 2 (14) where β = [β 1 , ..., β L ] T is the weight vector that connects the hidden layer with L nodes and the m ≥ 1 output nodes; h(x) = [h 1 (x), . . . , h L (x)] is the nonlinear mapping function; G(a i , b i , x) (with the parameters of hidden nodes (a, b)) is a nonlinear continuous function; H is the output matrix of the hidden layer (randomly generated); T is the training data matrix; . is the Frobenius method ; H ′ denotes the Moore -Penrose generalized inverse matrix of H.
In other words, ELM trains SLFN in two main phases: the first related to the random mapping of the characteristics and the second about the linear solution of the parameters. In the first step, ELM randomly initializes the hidden layer to map the input data in a feature space through some nonlinear mapping functions, which can be any nonlinear continuous function (Gaussian, Multi-quadratic, hyperbolic tangent, etc.) [16].
In the second step, the weights are related to the hidden layer and the output layer, indicated by β , and are estimated by decreasing the approximation error, observing the quadratic error, as described in Eq. 14, and the optimal solution found by Eq. 15.

Echo State Network -Extreme Learning Machine (ESN-ELM)
Despite the attractiveness of the structural simplicity and efficiency of the ELM training process, it is possible to obtain an improvement in its performance by using an Echo State Network (ESN) [17] as the input layer; this leads to a ESN-ELM hybrid model with a layer (the reservoir) to enhance the non-linear characteristic of the ANN and based on the ELM combiner [7]. The Echo State Network (ESN) is an ANN that has three distinct layers: input layer, reservoir and output layer. The ESN recursive equation (without output feedback connections) resembles like the following: where u(n), x(n) and y(n) are the input, the internal state of the reservoir and the output vector, respectively ; W in ∈ R N xK , W ∈ R N xN and W out ∈ R Lx(N xk) are the weights of the input, reservoir and output layers, respectively; K input units provide external stimulation u(n) ∈ R K ; N refers to internal units ( [17].
Additionally, for ESN to work properly, it is vital to choose the proper startup parameters. One of them is the Reservoir Size, which is the number of internal neurons within the ESN reservoir, probably the most determining aspect, because it directly impacts the maximum number of synaptic connections. Very small reservoirs will result in the inability to model accurately, while very large reservoirs can cause problems related to data overfitting.
Another parameter is the Connectivity σ, the fractional number of recurrent connections within the reservoir that affects the reservoir complexity. Too many connections can reduce the diversity of reservoir states, affecting ESN training capability. In practice it is common to use values in the range between [0.01; 0.2] [18].
The third main parameter is the Spectral radius ρ, the reservoir "memory" length. The larger the spectral radius, the more the previous inputs influence the current output, so to ensure the dynamics stability, the size of the spectral radius ρ, which is the highest absolute eigenvalue of the connection weight matrix W , must be correctly controlled.
Another hybrid network alternative using the ESN would be the ESN-MLP. Although it improves the ability for mapping, consequently improving the performance of the ANN, this alternative causes ESN to lose one of its main advantages, which is the simplicity in the training process, implying longer processing time on training [19].

Previous Experiment and Characteristics of Experimental Data
The present study used measurements made available by the Semiconductor Optics Group of the Technical University of Darmstadt (Germany). The experiment basically consists of injecting direct current into the QDL, controlling its operating temperature with a specific controller. The QDL optical power passes through an optical insulator (with the aim of allowing the emitted power to travel in only one direction), and then an inference filter (to narrow the spectral band), where it will later be detected by optical power meter and sampled by a fast photo-detector, as showed in [20]. The results of the experiment consists of 147 time-series of switch-on transient (in step response) of InAs/InGaAs QDL separated into GS, ES and total output power as well. Each time-series has 510 samples corresponding to 6 ns long duty-time of a square-wave like electrical driving.
The time-series were acquired for different current amplitudes in which the upper level changed progressively, with steps of 1 mA, from 64.9 mA up to 139.9 mA (@ 20 o C), illustrated in Fig. 4 and 5, and from 114.9 mA up to 185 mA (@ 40 o C), showed in Fig. 6 and 7.
Thus, Figures 4 and 5 illustrate the Output Optical Power (y axis) of the GS and ES, respectively, highlighting the predominance of emission in the GS for the operating temperature of 20 o C. In this case, the ES is only capable of emitting power from an injection current (x axis) higher than 100 mA, but for a short period of time (z axis).
Differently, Figures 6 and 7 show the change in the DSE when the operating temperature is increased. The ES emission is more relevant, in detriment of GS emission. Another relevant aspect is the temporal evaluation of the emission, where it is seen that for emission at 40 o C, the ES state emits first than the GS state, and the two only present parallel emission at the end of the sampled period in higher currents, above of 160 mA.

Proposed Model
The proposed black-box model consists in an ANN which implements the nonlinear function φ in Eq. 19; it represents the mapping from the input space containing the total optical power, P (n); the electrical current, I; the operating temperature, T ; and the time-vector, t(n); to the output space containing the GS and ES optical power at each discrete time instant, t(n). In turn, Eq. 20 describes the input optical power, P (n), and shows that the model needs a memory of n − k samples of the total optical power to support the time-series prediction.

Design parameters
To determine the quantity of the delayed input samples in Eq. 20, k, we calculated the Auto-correlation Function (ACF) and the Partial Auto-correlation Function (PACF) [21] of the vector P T . Thus, it is possible to choose the values that have the highest correlation, after sorting them in a descending way, excluding the values of low relevance, that is, choosing only the values that theoretically help ANN to be able to predict the system dynamics, regardless of its order.
Another design parameter investigated was the size of the hidden layer, which ranged from 1 to 1,000 hidden nodes (HN) with different activation functions. For the ELM model, the functions Senoidal (Sine), Radial Basis (Radbas) and Triangular Basis (Tribas) were used, while in the ESN-ELM model, the function the Hyperbolic Tangent (Tanh), as well as the Sine and Tribas function were used.
The data were grouped into training (70%) and testing (30%) samples to develop the models. Additionally, for the best performance of the algorithm, avoiding distortions due to the influence in the training of certain input variables with a greater dynamic range of values in relation to the others, all samples of learning and test input are normalized to the interval [0; 1].

Performance Evaluation
This work used one of the main performance indexes in regression model evaluation, the Root Mean Square Error (RMSE), defined by: where e n is the output error at time n, which is obtained by the difference between the real value and the value estimated by the ANN for the set of m output samples evaluated. The Mean Absolute Percentage Error (MAPE) is also used, defined by: where EP n is the percent error of output at time n and m again is the number of output samples evaluated.

Results
First, as mentioned in Subsection 3.3, we used the ACF and the PACF of the vector P T to determinate k in the Eq. 20, so, the Fig. 9 shows the PACF, where the first five lagged values with the highest correlation are those of 1 st , 2 nd , 3 rd , 5 th and 6 th order (which were used in the model), so the 4 th lagged value has no relevant correlation and was discarded as input to the model. The MLP best result is using the Levenberg-Marquardt training function without Delayed Input Power (k in Eq.20) and 48 neurons (Ne), resulting in an RMSE of 8.40 µW and MAPE of 1.32%. For this training function, increasing of the model order did not sound as a good choice, because it increased the complexity with the insertion of more delayed input powers and the performance got worse.
In contrast, the MLP with the Bayesian Regularization training function has its best result with 1 (one) Delayed Input Power and 46 neurons, with RMSE of 8.69 µW, and MAPE of 1.46%. However, that result is 3.45% worse than that obtained with the Levenberg-Marquardt function.
In the case of MLP using Scaled Conjugate Gradient function, the model had its worst results, and its best result is using 3 (three) Delayed Input Power elements with only 38 neurons (amount less than the average of the other functions), with RMSE of 9.10 µW and MAPE of 2.45%.
On the other hand, the ELM Model showed error lower than MLP for all quantities of delayed input powers and its best overall performance is using 2 (two) Delayed Input Power with 889 Hidden Nodes (HN) and the Radbas as activation function, resulting in a RMSE of 2.81 µW and MAPE of 0.50% with processing time (training and testing time) of 15.27 s. It means that the ELM has error 65% lower than that presented by MLP, showing its superiority for this application, and a processing time much lower than the rate equation processing time.
The best results with the Sine and Tribas activation functions are RMSE of 2.89 µW and MAPE of 0.54%, and RMSE of 3.29 µW and MAPE of 0.66%), respectively. They are inferior when compared to the results with Radbas, but they are still superior to the best results obtained with the MLP.
Another issue is the relationship between the number of hidden nodes and the amount of delayed optical power, which was, in general, an inversely proportional trend in the ELM Model (except when used with Tribas activation function). This can help to choose the configuration to be implemented in hardware, in future works, provided that the processing capacity and available memory in the device are met.
Despite the expectation of performance improvement with the insertion of the ESN with the ELM, the result obtained was inferior to the ELM model; it led to error levels similar to the other models only using the Hyperbolic Tangent (Tanh) as an activation function. In this configuration, the model had its best result with RMSE of 5.56 µW and MAPE of 1.02%, with k = 1 and 144 hidden nodes.
The other two activation functions used, Sine and Tribas, had RMSE values much higher than those found in the other models, with 43.69 µW and 196.25 µW, respectively, at its best performances, though the MAPE values presented were not as distorted, with 2.93% and 6.08%, respectively. These results show how important it is to use more than one parameter for model validation, thus preventing inadequate conclusions.
The main reason for the low performance of the ESN-ELM model is the use of non-optimized values for the parameters of the ESN part of the model. One way of mitigating the distortions found, consequently increasing the performance of the model, is the use of optimization algorithms such as Particle Swarm Optimization (PSO) [22] and Artificial Bee Colony (ABC) [23].
In addition to the validation parameters used and for better visualization of the performance achieved so far in this study, the Fig. 10 shows the experimental and theoretically predicted time-series of a switch-on transient experiment with I = 123 mA at 20 o C (a), and I = 174 mA at 40 o C (b). The rather good agreement between the solid (experimental) and dashed lines (predicted by the model), which we stress was obtained for a small input memory lag (k = 2) suggests that the proposed approach is a good candidate for the embedded electronics of low-resolution spectral analyzers.

Conclusion
The alternative of filtering the QDL optical power from ANN black-box model, using as input only the total power, temperature and current, is promising. The GS and ES optical power, that is, the dynamics Bi-State Emission, was predicted with ELM-and MLP-based approaches producing very low prediction errors. Additionally, they presented the processing time in the order of seconds, much lower than the rate equation model. In the case of the ESN-ELM model, contrary to expectations, it presented, in most of the tested configurations, results much worse than those presented by the other ANNs. Thus, it is necessary to evaluate its performance using optimization algorithms such as PSO and ABC, and again verify the feasibility of its application in the model.
The success of the alternative model presented is relevant in the context of optical filtering instrumentation, with the possibility of exploring the concept in the future implementation. Thus, it is possible to develop low-resolution and low-cost filters for certain applications in which it is not necessary and/or economically viable to use of a spectrum analyzer, which can be replaced by a simple optical power meter.
As future work, we will investigate a case of higher spectral resolution at least in continuous-wave operation, checking the ability of the inference machine to predict different lasing lines around the GS and ES. The experimental data necessary for this investigation is available and is currently under preparation.