GSOC 2012 and scikit-learn: August 2012

The end of GSOC is coming closer with lightening speed and there is still work to be done.

I'm currently implementing a method (Strong Rules [0]) to predict which coefficients of a regularized linear regression model will be zero at the solution.
Knowing this, makes it possible to remove the corresponding features (variables) from the data-set. The resulting data-set can lead to a much small problem that can be solved a lot faster then the original problem.

The method is working properly according to a number of implemented tests. This tests give now valuable feedback during the ongoing re-factoring. The re-factoring has two goals, make the implementation integrate nicely with the existing code and make the potential speedup reality.

The figure above shows, that the implementation with strong rule filtering is still slower then the version without. My primary goal for the remaining days it to bring the red line significantly below the blue line.

[0]

Tibshirani, R., J. Bien, J. Friedman, T. Hastie, N. Simon, J. Taylor, and R.J. Tibshirani. “Strong Rules for Discarding Predictors in Lasso-type Problems.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2011).

GSOC 2012 and scikit-learn

Dienstag, 14. August 2012

Burst Of Speed