Wednesday, 1 September 2010

it.information theory - Question regarding divergence

The inequality D(P|Q)geD(Pstar|Qstar) does not need to hold.



Here is an example.



Let A be the set 1,2,3,...,n. Let E be the set of measures P on A such that P(1)=0. Projecting a measure P on E using D is equivalent to conditioning P on A1. Choose P and Q such that they both put equal and nonzero mass on 1. By direct computation one sees: D(Pstar|Qstar)=frac11P(1)D(P|Q)>D(P|Q).



The details of the above computation are as follows.



For ease of notation set n=3. Let E be the set of measures P with P(1)=epsilon; to obtain the example above, one sets epsilon=0. Let us parametrize the measures on 1,2,3 as follows: P(1)=p1, P(2)=p2 and P(3)=1p1p2. Our problem is:
infQinEleft[p1logfracp1q1+p2logfracp2q2+(1p1p2)logfrac1p1p21q1q2right].


Let F denote the expression after the inf.
F is strictly convex in Q and therefore will have a unique optimizer. In the above coordinates, the normal to E is the vector (1,0). Then
fracpartialFpartialq1=fracp1q1+frac1p1p21q1q2=lambda

and
fracpartialFpartialq2=fracp2q2+frac1p1p21q1q2=0.

We have the constraint that QinE, i.e., q1=epsilon.
From the last two equalities one infers:
q2=frac(1epsilon)p21p1.



Going back to the coordinates (p1,p2,p3) to denote a measure on 1,2,3,
projecting a measure on E using D corresponds to the following map:
(p1,p2,p3)rightarrowleft(epsilon,(1epsilon)fracp2p2+p3,(1epsilon)fracp3p2+p3right).


For epsilon=0, this is the same as conditioning P on 2,3.



One obtains the expression for the relative entropy given above by directly computing it using this formula for the projections.

No comments:

Post a Comment