Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Make pandas optional dependency with polars as alternative #13780

Open
deanm0000 opened this issue Mar 25, 2024 · 9 comments
Open

[FEATURE] Make pandas optional dependency with polars as alternative #13780

deanm0000 opened this issue Mar 25, 2024 · 9 comments

Comments

@deanm0000
Copy link

Problem description

It'd be nice if pandas was an optional dependency with polars as an alternative engine. It has the benefit of being faster than pandas for almost everything and it also would allow polars users to have smaller package sizes by not requiring pandas.

Feature description

Have alternative methods that use polars instead of pandas. There could be a transition phase before all methods are bilingual where, if the input is a polars df but the polars version of the bokeh method hasn't been written yet then the user gets a warning that it's being converted to pandas to finish the task. If they don't have pandas installed they'd, of course, get an error. That warning might motivate more people to contribute to making those methods.

Potential alternatives

live with pandas conversions and dependencies :(

Additional information

No response

@bryevdv
Copy link
Member

bryevdv commented Mar 25, 2024

IMO we could certainly potentially expand built-in polars support, especially if an interested polars-user decided to step up and take ownership of the Polars experience in Bokeh. But I would not support removing pandas as a dependency any time soon. Having "optional" dependencies where things work or not depending on what happens to be installed, i.e.

If they don't have pandas installed they'd, of course, get an error.

Is a maintenance and support mess and I would be very opposed to going back to that.

The good news is, those are two entirely separate questions, and one (keeping pandas as a dependency) does not preclude the other (improving UX for polars users).

@MarcoGorelli
Copy link
Contributor

I was looking into this (I'm working with a client who's working in a constrained environment, dependency-wise) and would like to be able to produce plots of numeric data to save

NumPy + bokeh (without pandas) would be ~170MB, but including a dataframe library (either pandas or Polars) would push me above the AWS Lambda limit

where things work or not depending on what happens to be installed

Agree! And the good news is that that's not the case here. Bokeh only needs to import pandas if the user passes a pandas object (as far as I can tell), and so there would be no situation in which something currently errors but would start erroring if pandas weren't required

I just tried this locally, and by just importing pandas lazily, I can get

from bokeh.plotting import figure, show
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 7, 2, 4, 5])

p = figure(title="Simple line example", x_axis_label='x', y_axis_label='y')
p.line(x, y, legend_label="Temp.", line_width=2)
show(p)

to run just fine without having pandas installed

Regarding maintenance load - I think it would work to keep it as a test dependency, even if it's not a required runtime dependency?

I just put up a draft PR to demo what this would look like: #13849. It's not ready yet, I need to study the codebase more carefully, I'd just like to ask whether you'd be open to considering this

(regardless of what decision is taken here, I'd like to thank you for your work on Bokeh!)

@bryevdv
Copy link
Member

bryevdv commented Apr 28, 2024

Bokeh only needs to import pandas if the user passes a pandas object (as far as I can tell

hexbin requires pandas unconditionally, possibly other things at this point.

It's not ready yet

FYI what you describe is already what we used to do for a very long time, and explicitly stopped doing in #12369. It will be considerably more work (closer to 100 modules) to undo. I'm personally 👎 on going back but perhaps you can convince the rest of @bokeh/core

I think it would work to keep it as a test dependency, even if it's not a required runtime dependency?

That's not sufficient. If you claim that everything works without it installed, you have to explicitly test under those conditions.

@MarcoGorelli
Copy link
Contributor

Thanks for your response!

Is hexbin the only such function? If so, it seems a bit of a pity - from a user perspective - to require pandas everywhere just for that

I'll close the WIP PR then and won't take it to completion - all the best with Bokeh, thanks again for what you do 🙏

@philippjfr
Copy link
Contributor

Given the growing popularity of alternative tabular data libraries like Polars I think it's reasonable to consider a longer term vision where we make pandas an optional dependency again.

Polars has, for example adopted Bokeh as the default plotting backend via hvplot and it would be nice if that could be independent of pandas eventually. Right now this requires pandas anyway due to dependencies in Panel, Holoviews and hvPlot and because Polars is converted to pandas before plotting but as we move towards eliminating those dependencies eventually I would be in favor of doing the same in Bokeh.

@philippjfr
Copy link
Contributor

I'll try to narrow down exactly where in Bokeh pandas is a hard dependency rather than merely being used for supporting Pandas data.

@bryevdv
Copy link
Member

bryevdv commented Apr 29, 2024

If we do go back down this path then I think we must also change how we handle all the sampledata. Namely it should go in a separate pip/conda installable package that can depend on pandas, since most of the sampledata modules depends on pandas. Then as a bonus we can also jettison all the s3 sampledata download machinery in the main repo.

@philippjfr
Copy link
Contributor

Good plan, in fact I'd like to look at the sample data configuration anyway for pyodide.

@bryevdv
Copy link
Member

bryevdv commented Apr 30, 2024

I've made a separate issue for the sampledata, since it seems advised to do regardless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants