Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rely on GLMs from scikit-learn or glum #50

Open
lorentzenchr opened this issue Apr 28, 2024 · 2 comments
Open

Rely on GLMs from scikit-learn or glum #50

lorentzenchr opened this issue Apr 28, 2024 · 2 comments

Comments

@lorentzenchr
Copy link

Dear pysal/spglm maintainers

I would like to propose to replace the underlying GLM solver by either scikit-learn or glum and just keep your API that you build on top.

This way, you would gain very reliable and well tested GLM (solvers) and lower maintenance costs yourselves. From a community perspective, this would also be a step to work together, instead of many projects implementing the same things over and over again.

This is just a proposal for discussion.

Full disclosure: I'm a scikit-learn core developer.

@TaylorOshan
Copy link
Collaborator

This is an interesting proposal @lorentzenchr, thanks for opening this issue. We would definitely be open to it and are happy to discuss. I can provide some context.

This package was originally developed to support the spint package that is also part of the pysal ecosystem. The main reason we coded these GLM routines ourselves (or more truthfully borrowed code from statsmodels), was because we planned to modify them heavily to support sparse data structures that would be needed to optimize the estimation of regressions coming from spint where you might thousands of fixed effects coded as dummy variables in the design matrix. So while the sp in spint is for spatial, the sp in spglm is for sparse. To the extent that scikit-learn or glum could potentially continue supporting the sparse structures, that would definitely be a win.

The spglm package also supports the mgwr package, and there are instances where we have customized (and will in the near future plan to make additional customizations to the glm routine that might be trickier to do with the more reliable solvers.

The last tidbit is that some of us use the code for teaching and research and although the current routines may be potentially less reliable, they may be more approachable, but this may be an incorrect assumption.

A few of us were able to discuss this briefly, and we thought perhaps it might be possible to have one of these solvers as an option (possibly the default), while maintaining the previous implementation.

We would be open to collaborating on this if it was something that you wanted to take a shot at. I'm not sure we currently have the bandwidth to tackle it ourselves, but we can definitely see the potential benefits and are happy to continue discussing and assist.

@lorentzenchr
Copy link
Author

To the extent that scikit-learn or glum could potentially continue supporting the sparse structures, that would definitely be a win.

Both support sparse feature matrices and will continue to do so.

The spglm package also supports the mgwr package, and there are instances where we have customized

My understanding is that mgwr depends on spglm, so mgwr is a downstream package of spglm. What sort of customization do you do in spglm?
Both scikit-learn and glum offer the general API ModelClass(param1=.., param2=.., ..).fit(X, y, sample_weights).

We would be open to collaborating on this if it was something that you wanted to take a shot at

I can't invest in this with code or PRs. I thought, in the long run, you would gain the most from this change, in particular in terms of bug fixes and maintenance. And from glancing at your code also from just better solvers (e.g. you don't have line search in place).

We would be open to collaborating on this if it was something that you wanted to take a shot at. I'm not sure we currently have the bandwidth to tackle it ourselves, but we can definitely see the potential benefits and are happy to continue discussing and assist.

I am curious to see where it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants