Monday, 4 October 2010

it.information theory - Using Fisher Information to bound KL divergence

Is it possible to use Fisher Information at p to get a useful upper bound on KL(q,p)?



KL(q,p) is known as Kullback-Liebler divergence and is defined for discrete distributions over k outcomes as follows:



$$KL(q,p)=sum_i^k q_i log frac{q_i}{p_i}$$



The most obvious approach is to use the fact that 1/2 x' I x is the second order Taylor expansion of KL(p+x,p) where I is Fisher Information Matrix evaluated at p and try to use that as an upper bound (derivation of expansion from Kullback's book, 1,2,3)



If p(x,t) gives probability of observation x in a discrete distribution parameterized by parameter vector t, Fisher Information Matrix is defined as follows



$$I_{ij}(t)=sum_x p(x,t) (frac{partial}{partial t_i} log p(x,t)) ( frac{partial}{partial t_j} log p(x,t)) $$



Where sum is taken over all possible observations.



Below is a visualization of sets of k=3 multinomial distributions for some random p's (marked as black dots) where this bound holds. From plots it seems that this bound works for sets of distributions that are "between" p and the "furthermost" 0 entropy distribution.





Motivation: Sanov's theorem bounds probability of some event in terms of KL-divergence of the most likely outcome...but KL-divergence is unwieldy and it would be nicer to have a simpler bound, especially if it can be easily expressed in terms of parameters of the distribution we are working with

No comments:

Post a Comment