A few questions were asked, so a few answers will be given. (main point: likelihood is not necessarily a product density, though this is the common interpretation.)
Frequently, the likelihood is the product of densities over some provided set of examples. The examples are drawn i.i.d., and therefore this product density is the density for the corresponding product measure over the product space. What I'm saying is that yes, from this perspective, you have constructed a product density.
Since you are dealing with densities, not probabilities, values are not constrained to [0,1], and your density can easily be greater than one. In fact, if you are dealing with
dirac measure (which puts all mass on one point on the real line), you essentially have "infinite" density. I put that in quotes since this is not a continuous probability measure, ie it does not have a density wrt to Lebesgue measure, let alone one with infinite mass on a point. (A quick fact check: the corresponding integral wrt lebesgue measure would have value zero since it is off zero only on a set of lebesgue measure zero, which means it is not a probability distribution; but it was, which contradicts this being its density.) perhaps a more apt example: any (continuous) distribution on [0,0.5] will have to have density greater than one on a set of nonzero lebesgue measure. (you can try
to construct a sequence of these which convergence to something which violates what i said, but that will be the density of something which is not continuous!)
things can get a little confusing because you can write discrete probability distributions
as densities wrt a measure putting 1 on each point in the support set of the probability (ie it is counting measure wrt that set). NOTE that this is a density wrt a measure which is NOT a probability measure. But anyway, the density values at each point are exactly the probability values. This allows an interchanging probability masses and densities, which can be confusing.
I'll close with some further reading. A good book on machine learning is "A probabilistic Theory of Pattern Recognition" by Devroye, Gyorfi, Lugosi. Chapter 15 is on maximum likelihood and you'll notice they do NOT define likelihood as being a product probability or density, but rather as a product of functions. This is because they are careful to encompass the differing interpretations; rather, they ignore the interpretations there and work out the math.
No comments:
Post a Comment