Regarding your intuition, it may or may not be true. In many cases people do the opposite, i.e. downweight classes with low membership. This happens any time someone uses a prior p(c) on the class membership.
If you still think you need to boost the probability of small classes, then using different weights for different classes may help. Define the weight of class $i$ to be $W_i sim 1/p(c_i)$ ($p(c_i)$ is the prior probability of class $i$). Then pick $i$ that minimizes something like $W_i cdot p(c_i|MLP)$. Designing the weights has to be done empirically, I think.
For Tom:
MLP is multi-layer perceptron, a type of neural network.
The problem can be reformulated as follows. Suppose you have some data (like an image of a glyph). It can belong to one of several classes (say, an English character, 'a' to 'z', so 26 classes in this case). You magically get a set of probabilities $p(c_i | data)$ (the probability that your data is in class $c_i$). In our case, this is given by the MLP. You need to decide on a single class. The most obvious rule is to pick $c_i$ for which $p(c_i | data)$ is maximal. But OP feels that if one class has very few members, it should be given advantage. Another option is to have weights and pick the class that maximizes $W_i p(c_i | data)$. The disadvantage of this is that we don't have a principled method of choosing the weights $W_i$.
One possibility is to use $W_i sim 1/p(c_i)$. In this case, $p(c_i | data) / p(c_i) simeq p(data | c_i)$, which seems somewhat principled (not completely arbitrary). But maybe there are better ways of doing that.
No comments:
Post a Comment