Monday, 27 September 2010

pr.probability - Notions of "independent" and "uncorrelated" for subsets of the natural numbers

There are some ways to assign probability measures to the set of antural numbers. Consider the probability measure $P_s$ on the positive integers which assigns "probability" $n^{-s}/zeta(s)$ to the integer $n$. ($s$ is a constant real number greater than $1$.)



Then under this measure being a multiple of $r$ and a multiple of $s$ are independent events, in the probabilistic sense, if $r$ and $s$ don't have a common multiple. You can show this starting form the fact that the measure assigned to the set of multiples of $k$, for some positive integer $k$, is
$$ {1 over zeta(s)} sum_{n=1}^infty {1 over (kn)^s} = {1 over zeta(s)} {1 over k^s} zeta(s) = {1 over k^s}. $$
That is, the probability that a random positive integers is divisible by $k$ is $k^{-s}$. Of course you really want all integers to be equally likely, which should correspond to $s = 1$.



(I learned this from Gian-Carlo Rota, Combinatorial Snapshots. Link goes to SpringerLink; sorry if you don't have access.)



Under "suitable conditions", which I don't know what they are because Rota doesn't say, the density of any set of natural numbers $A$ is the limit $lim_{s to 1^+} P_s(A)$.



In particular it might be reasonable to define correlation between sets of natural numbers in the same way. Let $A$ and $B$ be two sets of natural numbers. Let $X$ and $Y$ be the indicator random variables of the sets $A$ and $B$ in the measure $P_s$. The Pearson correlation coefficient between $X$ and $Y$ is
$$ {(E(XY) - E(X) E(Y)) over sigma_X sigma_Y }$$
where $E$ is expectation and $sigma$ is standard deviation. Of course this can be simplified in the case where $X$ and $Y$ are indicators (and thus only take the values $0$ or $1$) -- in particular it simplifies to
$$ {P_s(A cap B) - P_s(A) P_s(B) over sqrt{P_s(A) P_s(B) (1-P_s(A)) (1-P_s(B))}} $$
We could then deifne the correlation between $A$ and $B$ to be the limit of this as $s to 1+$.



In the case where $A$ is the event divisible by 2'', for example, and $B$ is the eventdivisible by 3'', then $A cap B$ is the event ``divisible by 6''. So $P_s(A cap B) = 6^{-s}$, $P_s(A) = 2^{-s}$, and $P_s(A) = 3^{-s}$, so the numerator here is 0 and so the correlation is zero.



But in the case where $A$ is the event divisible by 4'' and $B$ is the eventdivisible by 6'', then $A cap B$ is the event ``divisible by 12''. So the correlation with respect to $P_s$ is
$$ {12^{-s} - 24^{-s} over sqrt{4^{-s} 6^{-s} (1-4^{-s}) (1-6^{-s})}} $$
which has the limit $1/sqrt{15}$ as $s to 1^+$; more generally the correlation between being divisible by $a$ and being divisible by $b$ is
$$ {ab - lcm(a,b) over lcm(a,b) sqrt{(a-1)(b-1)}} $$
and this may or may not be what you want.

No comments:

Post a Comment