Tuesday, 22 September 2009

matrices - Matrix Factorization Model

I'll give an example from politics. Let's say you have a legislative body, such as the House of Representatives in the U. S. Congress. Over a period of time, the members of the House will vote on many bills and thereby accrue a voting history. Let's encode votes as numbers: a vote for a bill is 1, a vote against a bill is -1, and an abstention (no vote) is 0. Also, let's label the representatives $R_1,ldots,R_m$, and label the bills $B_1,ldots,B_n$. We thus have, for each pair $i,j$ of numbers with $0 <ileq m$, $0 < j leq n$, a vote $V_{ij}in {-1,0,1}$, namely how congressperson $R_i$ voted on bill $B_j$. This gives us a big $mtimes n$ "vote matrix" $V$ whose entries are the votes $V_{ij}$.



Now, it's true that each congressperson has an opinion on every bill, or dually, that each bill appeals to different congresspeople in different amounts (possibly negative). However, that description misses an important aspect of the situation: a congressperson's voting behavior can be approximated using much fewer parameters than a list of all his votes. Also, a bill's tendency to appeal to different people can be approximated using much fewer parameters than a list of all the people who voted for and against it. Indeed, it's not really bills that congresspeople have opinions on; it's issues and policies. On the flip side, a given bill will implement various types of policies and address various issues, and that's really what determines who will like it and how much.



Thus, to describe voting behaviors, what you really need is (1) a list of policies $P_1,ldots,P_f$ (the latent factors), (2) for each congressperson $R_i$, a degree of preference $S_{ik}$ for each policy $P_k$, and (3) for each bill $B_j$, a value $C_{kj}$ describing to what extent it implements policy $P_k$. Let's continue the convention that positive values for $S_{ik}$ or $C_{kj}$ indicate accordance and negative values indicate opposition. To each congressperson $R_i$, we can assign the vector $S_i = (S_{i1},ldots,S_{if})$ (which we might call the "policy vector" for that congressperson), and to each bill $B_j$, we can assign the vector $C_j = (C_{1j},ldots,C_{fj})$ (which we might also call the "policy vector" for that bill).



For what follows, I'm going to use a very simplistic (but somewhat plausible) mathematical model. (Your article uses a more complicated and more realistic model, taking bias into account, for example.) Also, the model makes a lot more sense if congresspeople are asked to state their degree of preference for each bill rather than simply voting "yes" or "no," so that the "votes" $V_{ik}$ take values in $mathbb{R}$. When deciding how to vote on a bill, a congressperson may consider how well it correlates with her opinions and make a decision based on that. With a lot of vigorous hand waving and wishful thinking, the outcome of this process can be described very simply in terms of policy vectors: the vote $V_{ij}$ is simply the dot product $S_icdot C_j = sum_k S_{ik} C_{kj}$. (I'll leave it as an exercise to show that's not completely ridiculous, even if unlikely to be exactly true.) Another way of saying this is that if $S$ is the matrix with entries $S_{ik}$ and $C$ is the matrix with entries $C_{kj}$, then $V = SC$. In other words, our knowledge about legislative policies as latent factors induces a factorization of the matrix $V$.



One merit of the above approach is that although it's a bit too simple, it leads to well understood mathematics. For it to be useful, the number of policies $f$ should be much smaller than $m$ and $n$, the numbers of congresspeople and bills. In that case, the factorization $V = SC$ means that $V$ has rank $f$, which is small compared to its dimensions. In practice, admitting that the description in terms of policies as latent factors can only be a good approximation, not exact, this means that $V$ is well approximated by a low-rank matrix. Factorizations can be obtained from that observation alone using standard matrix tools like the singular-value decomposition. In particular, the policy vectors can be found even before you have any idea what the "policies" should be. (In other words, you don't have to sit down and make a list of policies you think are important and figure out what the policy vectors must be from that; you can use a standard algorithm which will determine the policies for you. Of course, it won't name the policies, but if you need to, you can compare policy vectors to figure out what real-world policies the algorithmically extracted policies approximate.)

No comments:

Post a Comment