Tuesday, 12 October 2010

it.information theory - Sanov's Theorem and Chernoff bound

You don't need to restrict yourself to finite $E$ -- you get it for any set $E$ of distributions. In hypothesis testing people often take $E$ to be come convex set of distributions.



[added after comment]:
I think the question is confusingly stated. The probability being calculated is not the probability of a hypothesis per se. Cover and Thomas simplify the notation a bit. let $X^n = (X_1, ldots, X_n)$ be drawn i.i.d. according to $Q$ on a finite set $mathcal{X}$. Let $E$ be any set of distributions. Let $T_{X^n}$ be the empirical distribution of $X_1, ldots, X_n$. Sanov's Theorem says:



$mathbb{P}( { X^n : T_{X^n} in E } ) le (n+1)^{|mathcal{X}|} exp( -n cdot min_{P in E} D(P | Q) )$



This bound holds for any $n$, but is only optimal asymptotically as $n to infty$. That is, the exponent won't be better than this divergence. For smaller $n$ you can get better bounds, especially if you can get rid of that polynomial factor. The looseness in Sanov's theorem as stated is mostly in this polynomial factor -- if $E$ is a singleton, then if you inspect the proof you see that you don't need that factor.



To take an example from coin tossing, suppose $Q$ corresponds to a coin with bias $1/2$ (a fair coin) and let $E = { P : P(heads) le 1/4 }$. This is of the form "sample mean is less than 1/4" (if I understand your question). Your complaint would seem to be that as stated Sanov makes you account for coins with bias much less than 1/4, when in fact you can "get away" with only considering $P(heads) = 1/4$.



So if I understand your question now, the answer is : Sanov's theorem is loose because there is a union being taken over all distributions in $E$, but this looseness is mostly important for small $n$. The bound holds for all $n$ but for smaller $n$ you can get better bounds (e.g. Chernoff) by re-inspecting the proof of Sanov or through other methods. However, as $n to infty$ these bounds will not be better than Sanov's in terms of the exponent multiplying $-n$.



Hope that answers it!

No comments:

Post a Comment