Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move sampledata files to pip/conda installable package #13856

Closed
bryevdv opened this issue Apr 30, 2024 · 6 comments · Fixed by #13874
Closed

Move sampledata files to pip/conda installable package #13856

bryevdv opened this issue Apr 30, 2024 · 6 comments · Fixed by #13874

Comments

@bryevdv
Copy link
Member

bryevdv commented Apr 30, 2024

Moving the sampledata to an installable package will afford the following:

  • bokeh_sampledata can depend on pandas potentially allowing the main package to drop pandas
  • having an installable package may help with pyodide usage (however, it will be a large package... likely tens of MB)
  • custom S3 download machinery can be removed from the main repo

I think the simplest thing to do will be to keep all the existing bokeh.sampledata.foo modules as shims, and the ones that need to can try to access their data from the separate bokeh_sampledata module. If bokeh_sampledata cannot be imported then an actionable error can be raised. This approach will avoid any changes to examples and docs, which would otherwise be significant if we want to rip bokeh.sampledata out of the main package entirely.

If any of the "small" datasets currently shipped with the package depend on pandas they will need to be moved. Or maybe we just move all the actual data, but keep the bokeh.sampledata shim module structure intact.

cc @philippjfr

@bryevdv
Copy link
Member Author

bryevdv commented May 2, 2024

I have most of a bokeh_sampledata repo set up, I'll push a new repo to GitHub once tests are moved over and package builds are working.

@bokeh/dev would it be sufficient to just generate a PyPI package for this? Or do we need to also generate a conda package?

@philippjfr
Copy link
Contributor

Thanks for pushing forward with this so quickly. I'm currently a little swamped but hope to take a closer look early next week. A conda package would certainly be nice.

@bryevdv
Copy link
Member Author

bryevdv commented May 3, 2024

@bokeh/dev the new repository is here https://github.com/bokeh/bokeh_sampledata

Still need to set up CI, etc. If there is any more modern way to streamline package building and publishing I am all ears.

@bryevdv
Copy link
Member Author

bryevdv commented May 3, 2024

As for how to integrate this here with a minimal of disruption, I have a vague idea of creating shim files in this repo that use module __getattr__ and __dir__ to make everything appear under bokeh.sampledata as it does currently. I haven't actually tried to make that work yet, though, especially with docs build.

I was also considering date-based versioning for the new package as well.

@hoxbro
Copy link
Contributor

hoxbro commented May 3, 2024

I have a vague idea of creating shim files in this repo that use module __getattr__ and __dir__ to make everything appear under bokeh.sampledata as it does currently.

This should work. Another thing is you will need if TYPE_CHECKING: with the actual imports, to make them discoverable with LSPs.

@bryevdv
Copy link
Member Author

bryevdv commented May 6, 2024

@hoxbro I am not sure I follow your comment, please have a look at the actual PR and let me know if there are some specific changes you think are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants