



Again I am passionate about this kind of technicalities.

I do not redefine the variables in the question. But let me define something more. Note that mathbf{h} has one more dimension than mathbf{x}, where except the first dimension of mathbf{h}, x_i = mu + h_i. For later convenience, I define mathbf{h} = left[ egin{array}{c} mu \ mathbf{	ilde{h}} end{array} 
ight] , where mu is the first dimension and mathbf{	ilde{h}} contains the rest.

On the other hand, given Sigma_x calculated above, one can easily show that its inverse is given by

Sigma_x^{-1} = left[ egin{array}{ccc} frac{1}{S_h} - frac{S_{mu}}{S_h} frac{1}{n S_{mu} +S_h}  - frac{S_{mu}}{S_h} frac{1}{n S_{mu} +S_h}  dotsc \ - frac{S_{mu}}{S_h} frac{1}{n S_{mu} +S_h}  frac{1}{S_h} - frac{S_{mu}}{S_h} frac{1}{n S_{mu} +S_h}  dotsc \ vdots  vdots  ddots end{array} 

(Note: I found this out by clumsily going through the calculation below, but verification is easy. And this is actually the main obstacle of this exercise.)

First we define the joint probability distribution function P(mathbf{x}, mathbf{h}). We know that mathbf{h} obeys the Gaussian distribution, and mathbf{x} is a linear superposition of the elements in mathbf{h}. It is the most appropriate to use Dirac delta function to describe this:

P(mathbf{x}, mathbf{h}) = frac{1}{sqrt{(2 pi)^{n+1} |Sigma_h|}} exp left( -frac{1}{2} mathbf{h}^T Sigma_h^{-1} mathbf{h} 
ight) delta^{(n)} (mathbf{x} - mathbf{P} mathbf{h}).

As we know, the probability distribution for mathbf{x} is in the form

P(mathbf{x}) = frac{1}{sqrt{(2 pi)^n |Sigma_x|}} exp left( -frac{1}{2} mathbf{x}^T Sigma_x^{-1} mathbf{x} 

However, we can also get P(mathbf{x}) by integrating out mathbf{h}:

P(mathbf{x}) = int d^{n+1} mathbf{h} P(mathbf{x}, mathbf{h}) \ = frac{1}{(2pi)^{n+1} | Sigma_h |} int dmu int d^n mathbf{	ilde{h}} exp left( - frac{1}{2} frac{mu^2}{S_{mu}} - frac{1}{2} mathbf{	ilde{h}}^T Sigma_{	ilde{h}}^{-1} mathbf{	ilde{h}} 
ight) prod_{i=1}^n delta(x_i - mu - h_i) \ = frac{1}{sqrt{(2pi)^{n+1} S_u S_h^n}} int dmu expleft[ -frac{1}{2} left(frac{1}{S_{mu}} + frac{n}{S_h}
ight) left( mu - frac{1}{frac{1}{S_{mu}} + frac{n}{S_h}} sum_{i=1}^n frac{x_i}{S_h} 
ight)^2 - frac{1}{2} mathbf{x}^T Sigma_x^{-1} mathbf{x} 
ight] \ = frac{1}{sqrt{(2pi)^n S_{mu} S_h^n left( frac{1}{S_{mu}} + frac{n}{S_h} 
ight)}} expleft( - frac{1}{2} mathbf{x}^T Sigma_x^{-1} mathbf{x} 

where you can see the determinant of Sigma_x is |Sigma_x| = S_{mu} S_h^n left( frac{1}{S_{mu}} + frac{n}{S_h} 

Then the conditional probability can be calculated:

E( mathbf{h} | mathbf{x} ) = int d^{n+1} mathbf{h} [mathbf{h} P(mathbf{h} | mathbf{x})] = int d^{n+1} mathbf{h} frac{mathbf{h} P(mathbf{x}, mathbf{h})}{P(mathbf{x})} \ = frac{1}{sqrt{2pi frac{|Sigma_h|}{|Sigma_x|}}} int dmu int d^n mathbf{	ilde{h}} left[ egin{array}{c} mu \ mathbf{	ilde{h}} end{array} 
ight] exp left( -frac{1}{2} frac{mu^2}{S_{mu}} - frac{1}{2} mathbf{	ilde{h}}^T Sigma_{	ilde{h}}^{-1} mathbf{h} + frac{1}{2} mathbf{x}^T Sigma_x^{-1} mathbf{x} 
ight) prod_{i=1}^n delta(x_i - mu - h_i) \ = sqrt{frac{frac{1}{S_{mu}}+frac{n}{S_h}}{2pi}} int dmu left[ egin{array}{c} mu \ mathbf{x} - mu mathbf{1}_c end{array} 
ight] exp left( -frac{1}{2} frac{mu^2}{S_{mu}} - frac{1}{2} (mathbf{x} - mu mathbf{1}_c)^T Sigma_{	ilde{h}}^{-1} (mathbf{x} - mu mathbf{1}_c) + frac{1}{2} mathbf{x}^T Sigma_x^{-1} mathbf{x} 

In the above, we simply exploit the definition of conditional probability, and integrate over all mathbf{	ilde{h}} with the Dirac delta function. There are a lot of algebra involved which the readers can diligently verify on their own. Continuing the calculation gives

E( mathbf{h} | mathbf{x} ) = sqrt{frac{frac{1}{S_{mu}}+frac{n}{S_h}}{2pi}} int dmu left[ egin{array}{c} mu \ mathbf{x} - mu mathbf{1}_c end{array} 
ight] exp left[ -frac{1}{2} left( frac{1}{S_{mu}} + frac{n}{S_h} 
ight) left( mu - frac{S_{mu}}{S_h+nS_{mu}} sum_{i=1}^n x_i 
ight] \ = left[ egin{array}{c} frac{S_{mu}}{S_h+nS_{mu}} sum_{i=1}^n x_i \ mathbf{x} - frac{S_{mu}}{S_h+nS_{mu}} sum_{i=1}^n x_i mathbf{1}_c end{array} 

A careful operation of algebra can show that this is exactly equal to Sigma_h P^T Sigma_x^{-1} mathbf{x}.

I have skipped many details, but readers can verify this on their own.

P.S.: For me, the most difficult part is to find the inverse of Sigma_x, which I found by calculating the integrals above and got the matrix elements, and then verify it by multiplying it by itself to see if it gives an identity matrix. The difference in the number of dimensions of mathbf{x} and mathbf{h}does impose some inconvenience. However, it is more about menial algebra instead of an intellectually challenging problems. If you want to have a taste without much algebraic operation, take the fewer dimensions and do the calculation first using Mathematica to get a feeling.

首先先安利下 The Matrix Cookbook,從此媽媽再也不用擔心我的矩陣推導了!



可得E(h|x)=Cov(h,x)cdot Cov(x,x)^{-1}x (1)

再由cookbook中公式(314):Cov(Ax,By)=Acdot Cov(x,y)cdot B^{T}




代入式(1)既得結果E(h|x)=Sigma_hP^T Sigma_x^{-1}x=Sigma_hP^T(PSigma_hP^T)^{-1}x (2)


E(h|x)=Sigma_hP^T Sigma_x^{-1}x=Sigma_hP^T(PSigma_hP^T)^{-1}x

繞了半天x=PhRightarrow h=P^{-1}x。。。






