st.statistics - Building a multi-variable regression model

Monday, 20 February 2012

st.statistics - Building a multi-variable regression model

The input dimension being 18 is a little problematic, so I have a few suggestions. I'm putting the statistical approach first due to your choice of tags and terminology..

Linear regression can be made to fit polynomials by simply explicitly creating all the monomial terms; ie for an input with two dimensions $x= (x_1,x_2)$, for a quadratic you'd instead use $x' = (1,x_1,x_2,x_1x_2,x_1^2,x_2^2)$. Notice that this means, to add all $d$th order terms, you would create $O(18^d)$ dimensions! Notice furthermore that your regression model has equivalently many parameters! Therefore, it may be beneficial to try to simplify the model a little with some regularization, maybe use lasso (l1-regularization). Note that, as specified, the degree of the polynomial is not being minimized. The regularization I mentioned only minimizes the l1 length, and even putting a sparsity constraint on the weights alone (which is simpler than minimizing degree) is nonconvex. To find the degree, you could binary search; ie try degrees $1,2,4,8,ldots$ until you get zero error, and then binary search within the last interval to find the exact order.

Another approach is to directly use polynomial interpolation. Simply grow $d$, building an interpolating polynomial at each iteration, and stopping when the one built on the provided points fits all other points. For univariate data, the Lagrange Polynomial[1] is the way to go. I don't know your case offhand but just noticed wikipedia has a "polynomial interpolation" page which should be helpful.

anything you try will be slow due to the dimension, your stipulation that you necessarily find the absolute smallest degree, and the vast number of parameters in any such polynomial.

[1] lagrange interpolation

Answer Desk

Monday, 20 February 2012

st.statistics - Building a multi-variable regression model

No comments:

Post a Comment