Monday, 27 September 2010

pr.probability - Notions of "independent" and "uncorrelated" for subsets of the natural numbers

There are some ways to assign probability measures to the set of antural numbers. Consider the probability measure Ps on the positive integers which assigns "probability" ns/zeta(s) to the integer n. (s is a constant real number greater than 1.)



Then under this measure being a multiple of r and a multiple of s are independent events, in the probabilistic sense, if r and s don't have a common multiple. You can show this starting form the fact that the measure assigned to the set of multiples of k, for some positive integer k, is
1overzeta(s)sumin=1nfty1over(kn)s=1overzeta(s)1overkszeta(s)=1overks.


That is, the probability that a random positive integers is divisible by k is ks. Of course you really want all integers to be equally likely, which should correspond to s=1.



(I learned this from Gian-Carlo Rota, Combinatorial Snapshots. Link goes to SpringerLink; sorry if you don't have access.)



Under "suitable conditions", which I don't know what they are because Rota doesn't say, the density of any set of natural numbers A is the limit limsto1+Ps(A).



In particular it might be reasonable to define correlation between sets of natural numbers in the same way. Let A and B be two sets of natural numbers. Let X and Y be the indicator random variables of the sets A and B in the measure Ps. The Pearson correlation coefficient between X and Y is
(E(XY)E(X)E(Y))oversigmaXsigmaY


where E is expectation and sigma is standard deviation. Of course this can be simplified in the case where X and Y are indicators (and thus only take the values 0 or 1) -- in particular it simplifies to
Ps(AcapB)Ps(A)Ps(B)oversqrtPs(A)Ps(B)(1Ps(A))(1Ps(B))

We could then deifne the correlation between A and B to be the limit of this as sto1+.



In the case where A is the event divisible by 2'', for example, and $B$ is the eventdivisible by 3'', then AcapB is the event ``divisible by 6''. So Ps(AcapB)=6s, Ps(A)=2s, and Ps(A)=3s, so the numerator here is 0 and so the correlation is zero.



But in the case where A is the event divisible by 4'' and $B$ is the eventdivisible by 6'', then AcapB is the event ``divisible by 12''. So the correlation with respect to Ps is
12s24soversqrt4s6s(14s)(16s)


which has the limit 1/sqrt15 as sto1+; more generally the correlation between being divisible by a and being divisible by b is
ablcm(a,b)overlcm(a,b)sqrt(a1)(b1)

and this may or may not be what you want.

No comments:

Post a Comment