Given sample \(\mathcal{Y}\) = {y1, y2,…, yN} of size N in which all Yi (s) are mutually independent and identically distributed (iid). Suppose X = (x1, x2,…, xn) which is vector of size n distributes normally with mean vector µand covariance matrix Σ as follows:
\(f\left(X|\mu ,{\Sigma }\right)={\left(2\pi \right)}^{-\frac{n}{2}}{\left|{\Sigma }\right|}^{-\frac{1}{2}}\text{e}\text{x}\text{p}\left(-\frac{1}{2}{\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}(X-\mu )\right)\)
|
(2.1)
|
Where the superscript “T” denotes vector (matrix) transposition operator. The nx1 mean µ and the nxn covariance matrix Σ:
$$\mu =\left(\begin{array}{c}{\mu }_{1}\\ {\mu }_{2}\\ ⋮\\ {\mu }_{n}\end{array}\right),{\Sigma }=\left(\begin{array}{cccc}{\sigma }_{1}^{2}& {\sigma }_{12}^{2}& \cdots & {\sigma }_{1n}^{2}\\ {\sigma }_{21}^{2}& {\sigma }_{2}^{2}& \cdots & {\sigma }_{2n}^{2}\\ ⋮& ⋮& \ddots & ⋮\\ {\sigma }_{n1}^{2}& {\sigma }_{n2}^{2}& \cdots & {\sigma }_{n}^{2}\end{array}\right)$$
Note, σij2 is covariance of xi and xj whereas σi2 is variance of xi. Let Y = (y1, y2,…, ym)T be random variable representing all sample random variable Yi = (yi1, yi2,…, yim)T. Note, X is nx1 vector and Y is mx1 vector. Suppose there is an assumption that Y is a combination of partial random variables (components) of X such that:
$${y}_{i}={\alpha }_{i0}+\sum _{j=1}^{n}{\alpha }_{ij}{x}_{i}$$
As a generalization, let A be mxn matrix whose elements are called regressive coefficients as follows:
\(A=\left(\begin{array}{ccccc}{\alpha }_{10}& {\alpha }_{11}& {\alpha }_{12}& \cdots & {\alpha }_{1n}\\ {\alpha }_{20}& {\alpha }_{21}& {\alpha }_{22}& \cdots & {\alpha }_{2n}\\ ⋮& ⋮& ⋮& \ddots & ⋮\\ {\alpha }_{m0}& {\alpha }_{m1}& {\alpha }_{m2}& \cdots & {\alpha }_{mn}\end{array}\right)\) \({A}_{0}=\left(\begin{array}{c}{\alpha }_{10}\\ {\alpha }_{20}\\ ⋮\\ {\alpha }_{m0}\end{array}\right),\tilde{A}=\left(\begin{array}{cccc}{\alpha }_{11}& {\alpha }_{12}& \cdots & {\alpha }_{1n}\\ {\alpha }_{21}& {\alpha }_{22}& \cdots & {\alpha }_{2n}\\ ⋮& ⋮& \ddots & ⋮\\ {\alpha }_{m1}& {\alpha }_{m2}& \cdots & {\alpha }_{mn}\end{array}\right)\)
This implies
\(Y={A}_{0}+\tilde{A}X\)
|
(2.2)
|
As a convention, let:
$${\tilde{A}}_{i}={\left({\alpha }_{i1},{\alpha }_{i2},\dots ,{\alpha }_{in}\right)}^{T}$$
Then
$$\tilde{A}=\left(\begin{array}{c}{\tilde{A}}_{1}\\ {\tilde{A}}_{2}\\ ⋮\\ {\tilde{A}}_{m}\end{array}\right)$$
$${y}_{i}={\alpha }_{i0}+{\tilde{A}}_{i}^{T}X$$
The equation above is regression function in which Y is called responsor and X is called regressor whereas A is called regressive matrix. The assumption is combinatorial assumption (CA) aforementioned and the method proposed here is called CA method or CA algorithm. Suppose Y distributes normally with mean \({A}_{0}+\tilde{A}X\) and covariance matrix Sas follows:
\(f\left(Y|X,A,S\right)={\left(2\pi \right)}^{-\frac{m}{2}}{\left|S\right|}^{-\frac{1}{2}}\text{e}\text{x}\text{p}\left(-\frac{1}{2}{\left(Y-{A}_{0}-\tilde{A}X\right)}^{T}{S}^{-1}(Y-{A}_{0}-\tilde{A}X)\right)\)
|
(2.3)
|
The mxm covariance matrix S is:
$$S=\left(\begin{array}{cccc}{s}_{1}^{2}& {s}_{12}^{2}& \cdots & {s}_{1m}^{2}\\ {s}_{21}^{2}& {s}_{2}^{2}& \cdots & {s}_{2m}^{2}\\ ⋮& ⋮& \ddots & ⋮\\ {s}_{m1}^{2}& {s}_{m2}^{2}& \cdots & {s}_{m}^{2}\end{array}\right)$$
Note, sij2 is covariance of yi and yj whereas si2 is variance of yi. As a convention, let:
$${S}_{i}={\left({s}_{i1},{s}_{i2},\dots ,{s}_{in}\right)}^{T}$$
Then
$$S=\left(\begin{array}{c}{S}_{1}^{T}\\ {S}_{2}^{T}\\ ⋮\\ {S}_{n}^{T}\end{array}\right)=\left({S}_{1},{S}_{2},\dots ,{S}_{n}\right)$$
As a convention, \({\stackrel{-}{S}}_{i}\) is the one which is like Si except that S is replaced by S−1. Note, \({\stackrel{-}{S}}_{i}\) is vector. Similarly, notations \({\stackrel{-}{s}}_{ij}^{2}\) and \({\stackrel{-}{s}}_{i}^{2}\) imply such a meaning. The marginal PDF of Y is now defined by support of regression model as follows:
$$f\left(Y|{\Theta }\right)=\underset{{\mathbb{R}}^{n}}{\int }f\left(X,Y|{\Theta }\right)\text{d}X\stackrel{\scriptscriptstyle\text{def}}{=}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|\mu ,{\Sigma }\right)f\left(Y|X,A,S\right)\text{d}X$$
Where parameter Θ = (µ, Σ, A, S)T is compound parameter. The equation above is not real total probability rule but it implies that the conditional PDF f(Y | X) is substituted by regression model. Consequently, the expectation Q(Θ | Θ(t)) becomes:
\(Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\text{l}\text{o}\text{g}\left(f\left(X|\mu ,{\Sigma }\right)f\left({Y}_{i}|X,A,S\right)\right)\text{d}X\)
|
(2.4)
|
It is necessary to specify the conditional PDF f(X | Y, Θ). Indeed, we have:
$$f\left(X|Y,{\Theta }\right)=\frac{f\left(X|\mu ,{\Sigma }\right)f\left(Y|X,A,S\right)}{\underset{{\mathbb{R}}^{n}}{\int }f\left(X|\mu ,{\Sigma }\right)f\left(Y|X,A,S\right)\text{d}X}=\frac{f\left(X,y|{\Theta }\right)}{f\left(y|{\Theta }\right)}$$
The joint PDF f(X, Y | Θ) which is the numerator of f(X | Y, Θ) is defined:
$$f\left(X,Y|{\Theta }\right)\stackrel{\scriptscriptstyle\text{def}}{=}f\left(X|\mu ,{\Sigma }\right)f\left(Y|X,A,S\right)={\left(2\pi \right)}^{-\frac{n+m}{2}}{\left(\left|{\Sigma }\right|\left|S\right|\right)}^{-\frac{1}{2}}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\left(Y-{A}_{0}\right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\mu \right)\right)*{f}_{0}\left(X,Y|{\Theta }\right)$$
Where,
$${f}_{0}\left(X,y|{\Theta }\right)=\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)+{\left(\tilde{A}\mu \right)}^{T}{S}^{-1}\tilde{A}\mu \right)\right)$$
The expression \({\left(\tilde{A}X\right)}^{T}{S}^{-1}\tilde{A}X\) is approximated with µ as follows:
$${\left(\tilde{A}X\right)}^{T}{S}^{-1}\tilde{A}X\cong {\left(\tilde{A}\mu \right)}^{T}{S}^{-1}\tilde{A}\mu$$
As a result, f0(X, Y | Θ) and f(X, Y | Θ) is approximated as follows:
$${f}_{0}\left(X,Y|{\Theta }\right)\cong \text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)\right)\right)$$
And
$$f\left(X,Y|{\Theta }\right)\cong {\left(2\pi \right)}^{-\frac{n+m}{2}}{\left(\left|{\Sigma }\right|\left|S\right|\right)}^{-\frac{1}{2}}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(Y-{A}_{0}-\tilde{A}\mu \right)}^{T}{S}^{-1}\left(Y-{A}_{0}-\tilde{A}\mu \right)\right)\right)*{f}_{0}\left(X,y|{\Theta }\right)$$
The approximation by removing X-dependency from the expression \({\left(\tilde{A}X\right)}^{T}{S}^{-1}\tilde{A}X\) is reasonable because the PDF f(X | Y, Θ) contains second-order proportion with the built-in expression \({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)\) and this PDF also reflects regression model with another built-in expression \({\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)\) including parameter S2. In other words, the dependency of \({\left(\tilde{A}X\right)}^{T}{S}^{-1}\tilde{A}X\) on X is unnecessary. Moreover, EM process will adjust parameters by the best way later. In following proofs and computations, we will see that such dependency removal also makes the research easy to apply shifted Gaussian integral.
The denominator of f(X | Y, Θ) which is f(Y | Θ) is the integral of f(X, Y | Θ) over X:
$$f\left(Y|{\Theta }\right)=\underset{{\mathbb{R}}^{n}}{\int }f\left(X,Y|{\Theta }\right)\text{d}X={\left(2\pi \right)}^{-\frac{n+m}{2}}{\left(\left|{\Sigma }\right|\left|S\right|\right)}^{-\frac{1}{2}}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(Y-{A}_{0}-\tilde{A}\mu \right)}^{T}{S}^{-1}\left(Y-{A}_{0}-\tilde{A}\mu \right)\right)\right)B$$
Where B is the integral of f0(X, Y | Θ) over X:
$$B=\underset{{\mathbb{R}}^{n}}{\int }{f}_{0}\left(X,Y|{\Theta }\right)\text{d}X\cong \underset{{\mathbb{R}}^{n}}{\int }\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)\right)\right)\text{d}X$$
It requires to calculate B to determine f(X | Y, Θ). By referring the appendix, we can denote:
$$B\cong E\left({f}_{0}\left(X,y|{\Theta }\right)\right)$$
Obviously, B is totally determined. Thus, f(Y | Θ) is approximated as follows:
$$f\left(Y|{\Theta }\right)\cong {\left(2\pi \right)}^{-\frac{n+m}{2}}{\left(\left|{\Sigma }\right|\left|S\right|\right)}^{-\frac{1}{2}}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(Y-{A}_{0}-\tilde{A}\mu \right)}^{T}{S}^{-1}\left(Y-{A}_{0}-\tilde{A}\mu \right)\right)\right)*E\left({f}_{0}\left(X,y|{\Theta }\right)\right)$$
As a result, the PDF f(X | Y, Θ) is approximated as follows:
$$f\left(X|Y,{\Theta }\right)=\frac{f\left(X,Y|{\Theta }\right)}{f\left(Y|{\Theta }\right)}\cong \frac{1}{E\left({f}_{0}\left(X,Y|{\Theta }\right)\right)}\text{*}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)\right)\right)$$
Let k(Y|Θ) be the constant with subject to X but it is function of Ywith parameter Θ, which is defined as follows:
\(k\left(Y|{\Theta }\right)=\frac{1}{E\left({f}_{0}\left(X,Y|{\Theta }\right)\right)}\)
|
(2.5)
|
Shortly, the conditional PDF f(X | Y, Θ(t)) is specified (approximated) at E-step of some tthiteration process as follows:
\(f\left(X|Y,{{\Theta }}^{\left(t\right)}\right)\stackrel{\scriptscriptstyle\text{def}}{=}k\left(Y|{{\Theta }}^{\left(t\right)}\right)\text{*}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)\right)\right)\)
|
(2.6)
|
Consequently, the expectation Q(Θ | Θ(t)) at E-step of some tth iteration is totally determined. At M-step of current tth iteration, Q(Θ|Θ(t)) is maximized by setting its partial derivatives regarding Θ to be zero. The first-order partial derivative of Q(Θ | Θ(t)) with regard to µ with note that Q(Θ | Θ(t)) is analytic function is:
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial \mu }=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\frac{\partial \text{l}\text{o}\text{g}\left(f\left(X|\mu ,{\Sigma }\right)\right)}{\partial \mu }\text{d}X$$
$$=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right){\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\text{d}X$$
$$=\left(\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right){\left(X-{\mu }^{\left(t\right)}\right)}^{T}\text{d}X-N{\mu }^{T}+N{\left({\mu }^{\left(t\right)}\right)}^{T}\right){{\Sigma }}^{-1}$$
By referring to the appendix, we have:
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial \mu }=\left(-N{\mu }^{T}+N{\left({\mu }^{\left(t\right)}\right)}^{T}+\sum _{i=1}^{N}E\left(X|f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)\right){{\Sigma }}^{-1}$$
Note, Σ is invertible and symmetric. As a convention, I denote:
$$E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)=E\left(X|f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)$$
The next parameter µ(t+1) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is solution of the equation \(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial \mu }={0}^{T}\), with note that 0 = (0, 0,…, 0)T is zero vector, as follows:
\({\mu }^{\left(t+1\right)}={\mu }^{\left(t\right)}+\frac{1}{N}\sum _{i=1}^{N}E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\)
|
(2.7)
|
The first-order partial derivative of Q(Θ | Θ(t)) with regard to Σ with note that Q(Θ | Θ(t)) is analytic function is:
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial {\Sigma }}=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\frac{\partial \text{l}\text{o}\text{g}\left(f\left(X|\mu ,{\Sigma }\right)\right)}{\partial {\Sigma }}\text{d}X$$
$$=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(-\frac{1}{2}{{\Sigma }}^{-1}+\frac{1}{2}{{\Sigma }}^{-1}\left(X-\mu \right){\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\right)\text{d}X$$
The next parameter Σ(t+1) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is solution of equation formed by setting \(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial {\Sigma }}\) to zero matrix. Let (0) denote zero matrix.
$$\left(0\right)=\left(\begin{array}{cccc}0& 0& \cdots & 0\\ 0& 0& \cdots & 0\\ ⋮& ⋮& \ddots & ⋮\\ 0& 0& \cdots & 0\end{array}\right)$$
We have:
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial {\Sigma }}=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(-{\Sigma }+\left(X-{\mu }^{\left(t\right)}\right){\left(X-{\mu }^{\left(t\right)}\right)}^{T}\right)\text{d}X=\left(0\right)$$
Note, µ is replaced by µ(t). Thus, the next parameter Σ(t+1) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is obtained:
$${{\Sigma }}^{\left(t+1\right)}=\frac{1}{N}\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(X-{\mu }^{\left(t\right)}\right){\left(X-{\mu }^{\left(t\right)}\right)}^{T}\text{d}X$$
By referring to the appendix, we obtain:
\({{\Sigma }}^{\left(t+1\right)}=\frac{1}{N}\sum _{i=1}^{N}E\left(X{X}^{T}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\)
|
(2.8)
|
As a convention, I denote:
$$E\left(X{X}^{T}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)=E\left(X{X}^{T}|f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)$$
The first-order partial derivative of Q(Θ | Θ(t)) with regard to A0 with note that Q(Θ | Θ(t)) is analytic function is:
$${\left(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial {A}_{0}}\right)}^{T}=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\frac{\partial \text{l}\text{o}\text{g}\left(f\left(X|Y,{\Theta }\right)\right)}{\partial {\alpha }_{0}}\text{d}X$$
$$={S}^{-1}\left(\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left({Y}_{i}-{A}_{0}-\tilde{A}{\mu }^{\left(t\right)}-\tilde{A}\left(X-{\mu }^{\left(t\right)}\right)\right)\text{d}X\right)$$
$$={S}^{-1}\left(\sum _{i=1}^{N}{Y}_{i}-N{A}_{0}-N\tilde{A}{\mu }^{\left(t\right)}-\tilde{A}\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(X-{\mu }^{\left(t\right)}\right)\text{d}X\right)$$
By referring to the appendix, we obtain:
$${\left(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial {A}_{0}}\right)}^{T}={S}^{-1}\left(\sum _{i=1}^{N}{Y}_{i}-N{A}_{0}-N{\tilde{A}}^{\left(t\right)}{\mu }^{\left(t\right)}-{\tilde{A}}^{\left(t\right)}\sum _{i=1}^{N}E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)$$
Note, \(\tilde{A}\) is replaced by \({\tilde{A}}^{\left(t\right)}\) at current tth iteration. Therefore, the next parameter A0(t+1) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is obtained by setting the partial derivative \(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial {A}_{0}}\) to be zero:
\({A}_{0}^{\left(t+1\right)}=\frac{1}{N}\left(\sum _{i=1}^{N}{Y}_{i}-N{\tilde{A}}^{\left(t\right)}{\mu }^{\left(t\right)}-N{\tilde{A}}^{\left(t\right)}\left({\mu }^{\left(t+1\right)}-{\mu }^{\left(t\right)}\right)\right)\)
|
(2.9)
|
Note, the next parameter µ(t+1) is specified by equation 2.7. The first-order partial derivative of Q(Θ | Θ(t)) with regard to \(\tilde{\alpha }\) with note that Q(Θ | Θ(t)) is analytic function is (Saliba, 2016):
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial \tilde{A}}=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\frac{\partial \text{l}\text{o}\text{g}\left(f\left(X|{Y}_{i},{\Theta }\right)\right)}{\partial \tilde{A}}\text{d}X$$
$$=2\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right){\left({Y}_{i}-{A}_{0}-\tilde{A}X\right)}^{T}{S}^{-1}\left({X}^{T}⨂{I}_{m}\right)\text{d}X$$
Where \(⨂\) denotes Kronecker product and Im is mxm identity matrix. The next parameter \({\tilde{A}}^{\left(t+1\right)}\) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is obtained by setting the partial derivative \(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial \tilde{A}}\) to be zero. Because the Kronecker product \({X}^{T}⨂{I}_{n}\) occurs, the equation \(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial \tilde{A}}\) being equal to zero is equivalent to 1 equation \({u}_{j\ne 1}\left(\tilde{A}\right)=0\) and n equations \({v}_{j}\left(\tilde{A}\right)=0\), where
$${u}_{j\ne 1}\left(\tilde{A}\right)={\stackrel{-}{S}}_{j}^{T}\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left({Y}_{i}-{A}_{0}-\tilde{A}X\right)\text{d}X$$
$$={\stackrel{-}{S}}_{j}^{T}\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left({Y}_{i}-{A}_{0}-\tilde{A}\left(X-\mu \right)-\tilde{A}\mu \right)\text{d}X$$
$$={\stackrel{-}{S}}_{j}^{T}\left(\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)-N\tilde{A}\left(\mu +E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)\right)$$
And
$${v}_{j}\left(\tilde{A}\right)={\stackrel{-}{S}}_{1}^{T}\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left({Y}_{i}-{A}_{0}-\tilde{A}X\right){x}_{j}\text{d}X$$
$$={\stackrel{-}{S}}_{1}^{T}\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(\left({Y}_{i}-{A}_{0}\right)\left({x}_{j}-{\mu }_{j}\right)+\left({Y}_{i}-{A}_{0}\right){\mu }_{j}-\tilde{A}X\right){x}_{j}\text{d}X$$
$$={\stackrel{-}{S}}_{1}^{T}\left(\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)\left(E\left({x}_{j}|f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)+{\mu }_{j}\right)-\tilde{A}E\left({x}_{j}X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)$$
$$={\stackrel{-}{S}}_{1}^{T}\left(\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)\left(E\left({x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)+{\mu }_{j}\right)-N\tilde{A}\stackrel{-}{E}\left({x}_{j}X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)$$
Where,
$$\stackrel{-}{E}\left({x}_{j}X|f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)=\left(\begin{array}{c}\stackrel{-}{E}\left({x}_{1}{x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\\ \stackrel{-}{E}\left({x}_{2}{x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\\ ⋮\\ \stackrel{-}{E}\left({x}_{n}{x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\end{array}\right)$$
It is necessary to determine every partial expectation \(\stackrel{-}{E}\left({x}_{i}{x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\) to determine the expectation \(\stackrel{-}{E}\left({x}_{j}X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\). Indeed, by referring to the appendix, we have:
$$\stackrel{-}{E}\left({x}_{i}{x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)=\stackrel{-}{E}\left(\left({x}_{i}-{\mu }_{i}^{\left(t\right)}\right)\left({x}_{j}-{\mu }_{j}^{\left(t\right)}\right)+{x}_{i}{\mu }_{j}+{x}_{j}{\mu }_{i}^{\left(t\right)}-{\mu }_{i}^{\left(t\right)}{\mu }_{j}^{\left(t\right)}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)$$
$$=E\left({x}_{i}{x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)+{\mu }_{i}^{\left(t\right)}E\left({x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)+{\mu }_{j}^{\left(t\right)}E\left({x}_{i}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)-{\mu }_{i}^{\left(t\right)}{\mu }_{j}^{\left(t\right)}$$
As a result, the next parameter \({\tilde{A}}^{\left(t+1\right)}\) is solution of n+1 equations as follows:
\({\tilde{A}}^{\left(t+1\right)}\stackrel{\scriptscriptstyle\text{def}}{=}\left\{\begin{array}{l}\tilde{A}\left({\mu }^{\left(t\right)}+E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)=\frac{1}{N}\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)\\ {\left.\tilde{A}\stackrel{-}{E}\left({x}_{j}X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)=\frac{1}{N}\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)\left(E\left({x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)+{\mu }_{j}^{\left(t\right)}\right)\right|}_{j=\stackrel{-}{1,n}}\end{array}\right.\)
|
(2.10)
|
Note, µ is replaced by the current µ(t). The equation 2.10 can be solved by Newton-Raphson method. The first-order partial derivative of Q(Θ | Θ(t)) with regard to S with note that Q(Θ | Θ(t)) is analytic function is:
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial S}=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\frac{\partial \text{l}\text{o}\text{g}\left(f\left(X|{Y}_{i},{\Theta }\right)\right)}{\partial S}\text{d}X$$
$$=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(-\frac{1}{2}{S}^{-1}+\frac{1}{2}{S}^{-1}\left({Y}_{i}-{A}_{0}-\tilde{A}X\right){\left({Y}_{i}-{A}_{0}-\tilde{A}X\right)}^{T}{S}^{-1}\right)\text{d}X$$
The next parameter S(t+1) at M-step of some tth iteration that maximizes Q(Θ|Θ(t)) is solution of equation formed by setting \(\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial S}\) to zero matrix. By referring to the appendix, we have:
$$\frac{\partial Q\left({\Theta }|{{\Theta }}^{\left(t\right)}\right)}{\partial S}=\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(-S+\left({Y}_{i}-{A}_{0}-\tilde{A}X\right){\left({Y}_{i}-{A}_{0}-\tilde{A}X\right)}^{T}\right)\text{d}X$$
$$=-NS+\sum _{i=1}^{N}\underset{{\mathbb{R}}^{n}}{\int }f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left(\left({Y}_{i}-{A}_{0}\right){\left({Y}_{i}-{A}_{0}\right)}^{T}-2\tilde{A}\left(X-\mu \right){\left({Y}_{i}-{A}_{0}\right)}^{T}-2\tilde{A}\mu {\left({Y}_{i}-{A}_{0}\right)}^{T}+\tilde{A}\left(X-\mu \right){\left(X-\mu \right)}^{T}{\tilde{A}}^{T}+2\tilde{A}\left(X-\mu \right){\mu }^{T}{\tilde{A}}^{T}+\tilde{A}\mu {\mu }^{T}{\tilde{A}}^{T}\right)\text{d}X$$
$$=-NS+\tilde{A}\mu {\mu }^{T}{\tilde{A}}^{T}+\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}-2\tilde{A}\mu \right){\left({Y}_{i}-{A}_{0}\right)}^{T}+\sum _{i=1}^{N}\left(-2\tilde{A}E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left({\left({Y}_{i}-{A}_{0}\right)}^{T}-{\mu }^{T}{\tilde{A}}^{T}\right)+\tilde{A}E\left(X{X}^{T}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right){\tilde{A}}^{T}\right)=\left(0\right)$$
Therefore, we obtain:
\({S}^{\left(t+1\right)}=\frac{1}{N}\left({\tilde{A}}^{\left(t\right)}\mu {\mu }^{T}{\left({\tilde{A}}^{\left(t\right)}\right)}^{T}+\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}^{\left(t\right)}-2{\tilde{A}}^{\left(t\right)}\mu \right){\left({Y}_{i}-{A}_{0}^{\left(t\right)}\right)}^{T}+\sum _{i=1}^{N}\left(-2{\tilde{A}}^{\left(t\right)}E\left(X|f\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)\left({\left({Y}_{i}-{A}_{0}^{\left(t\right)}\right)}^{T}-{\mu }^{T}{\left({\tilde{A}}^{\left(t\right)}\right)}^{T}\right)+{\tilde{A}}^{\left(t\right)}E\left(X{X}^{T}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right){\left({\tilde{A}}^{\left(t\right)}\right)}^{T}\right)\right)\)
|
(2.11)
|
Note, µ, A0 and \(\tilde{A}\) are replaced by µ(t), A0(t) and \({\tilde{A}}^{\left(t\right)}\), respectively. In general, CA method is EM process with two steps as follows:
E-step:
Determining the conditional PDF f(X | Y, Θ(t)) specified by equation 2.6 based on current parameter Θ(t).
$$f\left(X|Y,{{\Theta }}^{\left(t\right)}\right)\stackrel{\scriptscriptstyle\text{def}}{=}k\left(Y|{{\Theta }}^{\left(t\right)}\right)\text{*}\text{e}\text{x}\text{p}\left(-\frac{1}{2}\left({\left(X-\mu \right)}^{T}{{\Sigma }}^{-1}\left(X-\mu \right)-2{\left(Y-{A}_{0}\right)}^{T}{S}^{-1}\tilde{A}\left(X-\mu \right)\right)\right)$$
M-step:
Calculating next parameters Θ(t+1) = (µ(t+1), Σ(t+1), A(t+1), S(t+1))T based on f(X | Y, Θ(t)) determined in the E-step, specified by equations 2.7, 2.8, 2.9, 2.10, and 2.11.
\({\mu }^{\left(t+1\right)}={\mu }^{\left(t\right)}+\frac{1}{N}\sum _{i=1}^{N}E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\)
\({{\Sigma }}^{\left(t+1\right)}=\frac{1}{N}\sum _{i=1}^{N}E\left(X{X}^{T}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\)
\({A}_{0}^{\left(t+1\right)}=\frac{1}{N}\left(\sum _{i=1}^{N}{Y}_{i}-N{\tilde{A}}^{\left(t\right)}{\mu }^{\left(t\right)}-N{\tilde{A}}^{\left(t\right)}\left({\mu }^{\left(t+1\right)}-{\mu }^{\left(t\right)}\right)\right)\)
\({\tilde{A}}^{\left(t+1\right)}\stackrel{\scriptscriptstyle\text{def}}{=}\left\{\begin{array}{l}\tilde{A}\left(\mu +E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\right)=\frac{1}{N}\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)\\ {\left.\tilde{A}\stackrel{-}{E}\left({x}_{j}X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)=\frac{1}{N}\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}\right)\left(E\left({x}_{j}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)+{\mu }_{j}\right)\right|}_{j=\stackrel{-}{1,n}}\end{array}\right.\)
\({S}^{\left(t+1\right)}=\frac{1}{N}\left({\tilde{A}}^{\left(t\right)}\mu {\mu }^{T}{\left({\tilde{A}}^{\left(t\right)}\right)}^{T}+\sum _{i=1}^{N}\left({Y}_{i}-{A}_{0}^{\left(t\right)}-2{\tilde{A}}^{\left(t\right)}\mu \right){\left({Y}_{i}-{A}_{0}^{\left(t\right)}\right)}^{T}+\sum _{i=1}^{N}\left(-2{\tilde{A}}^{\left(t\right)}E\left(X|{Y}_{i},{{\Theta }}^{\left(t\right)}\right)\left({\left({Y}_{i}-{A}_{0}^{\left(t\right)}\right)}^{T}-{\mu }^{T}{\left({\tilde{A}}^{\left(t\right)}\right)}^{T}\right)+{\tilde{A}}^{\left(t\right)}E\left(X{X}^{T}|{Y}_{i},{{\Theta }}^{\left(t\right)}\right){\left({\tilde{A}}^{\left(t\right)}\right)}^{T}\right)\right)\)
|
In practice, it is not necessary to compute the covariance matrix Σ(t+1) and the variance S(t+1) because computational cost is high and it is also really ineffective to estimate Σ and S. Note, the condition that both Σ and S are invertible along with |Σ| ≠ 0 and |S| ≠ 0 is not easy to assert over many computational iterations. The most important parameters are µ and A and we should fix the other parameters Σ and S with hints of predefined bias or background knowledge.