Wednesday, 1 September 2010

it.information theory - Question regarding divergence

The inequality $D(P'|Q') ge D(P^star| Q^star)$ does not need to hold.



Here is an example.



Let $A$ be the set ${1,2,3,...,n}$. Let $E$ be the set of measures $P$ on $A$ such that $P({1}) = 0$. Projecting a measure $P$ on $E$ using $D$ is equivalent to conditioning $P$ on $ A- {1}$. Choose $P'$ and $Q'$ such that they both put equal and nonzero mass on ${1}$. By direct computation one sees: $D(P^star| Q^star) = frac{1}{1-P'({1})} D(P'|Q') > D(P' | Q')$.



The details of the above computation are as follows.



For ease of notation set $n=3$. Let $E$ be the set of measures $P$ with $P({1}) =epsilon$; to obtain the example above, one sets $epsilon = 0$. Let us parametrize the measures on ${1,2,3}$ as follows: $P({1}) = p_1$, $P({2}) =p_2$ and $P({3}) = 1-p_1 -p_2$. Our problem is:
$$
inf_{ Q in E}left[ p_1 log frac{p_1}{q_1} + p_2 log frac{p_2}{q_2} + (1-p_1 -p_2) logfrac{ 1- p_1 - p_2}{ 1- q_1 - q_2 } right].
$$
Let $F$ denote the expression after the $inf$.
$F$ is strictly convex in $Q$ and therefore will have a unique optimizer. In the above coordinates, the normal to $E$ is the vector $(1,0)$. Then
$$
frac{partial F} {partial q_1} = -frac{p_1}{q_1} + frac{1-p_1-p_2}{1-q_1-q_2} = lambda
$$
and
$$
frac{partial F} {partial q_2} = -frac{p_2}{q_2} + frac{1-p_1-p_2}{1-q_1-q_2} = 0.
$$
We have the constraint that $Qin E$, i.e., $q_1 =epsilon$.
From the last two equalities one infers:
$$
q_2 = frac{(1-epsilon) p_2}{ 1-p_1}.
$$



Going back to the coordinates $(p_1,p_2,p_3)$ to denote a measure on ${1,2,3}$,
projecting a measure on $E$ using $D$ corresponds to the following map:
$$
(p_1,p_2,p_3) rightarrow left(epsilon, (1-epsilon)frac{p_2}{p_2+p_3}, (1-epsilon)frac{p_3}{p_2 + p_3}right).
$$
For $epsilon =0$, this is the same as conditioning $P$ on ${2,3}$.



One obtains the expression for the relative entropy given above by directly computing it using this formula for the projections.

No comments:

Post a Comment