New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Make pandas optional dependency with polars as alternative #13780
Comments
IMO we could certainly potentially expand built-in polars support, especially if an interested polars-user decided to step up and take ownership of the Polars experience in Bokeh. But I would not support removing pandas as a dependency any time soon. Having "optional" dependencies where things work or not depending on what happens to be installed, i.e.
Is a maintenance and support mess and I would be very opposed to going back to that. The good news is, those are two entirely separate questions, and one (keeping pandas as a dependency) does not preclude the other (improving UX for polars users). |
I was looking into this (I'm working with a client who's working in a constrained environment, dependency-wise) and would like to be able to produce plots of numeric data to save NumPy + bokeh (without pandas) would be ~170MB, but including a dataframe library (either pandas or Polars) would push me above the AWS Lambda limit
Agree! And the good news is that that's not the case here. Bokeh only needs to import pandas if the user passes a pandas object (as far as I can tell), and so there would be no situation in which something currently errors but would start erroring if pandas weren't required I just tried this locally, and by just importing pandas lazily, I can get from bokeh.plotting import figure, show
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 7, 2, 4, 5])
p = figure(title="Simple line example", x_axis_label='x', y_axis_label='y')
p.line(x, y, legend_label="Temp.", line_width=2)
show(p) to run just fine without having pandas installed Regarding maintenance load - I think it would work to keep it as a test dependency, even if it's not a required runtime dependency? I just put up a draft PR to demo what this would look like: #13849. It's not ready yet, I need to study the codebase more carefully, I'd just like to ask whether you'd be open to considering this (regardless of what decision is taken here, I'd like to thank you for your work on Bokeh!) |
FYI what you describe is already what we used to do for a very long time, and explicitly stopped doing in #12369. It will be considerably more work (closer to 100 modules) to undo. I'm personally 👎 on going back but perhaps you can convince the rest of @bokeh/core
That's not sufficient. If you claim that everything works without it installed, you have to explicitly test under those conditions. |
Thanks for your response! Is I'll close the WIP PR then and won't take it to completion - all the best with Bokeh, thanks again for what you do 🙏 |
Given the growing popularity of alternative tabular data libraries like Polars I think it's reasonable to consider a longer term vision where we make pandas an optional dependency again. Polars has, for example adopted Bokeh as the default plotting backend via hvplot and it would be nice if that could be independent of pandas eventually. Right now this requires pandas anyway due to dependencies in Panel, Holoviews and hvPlot and because Polars is converted to pandas before plotting but as we move towards eliminating those dependencies eventually I would be in favor of doing the same in Bokeh. |
I'll try to narrow down exactly where in Bokeh pandas is a hard dependency rather than merely being used for supporting Pandas data. |
If we do go back down this path then I think we must also change how we handle all the sampledata. Namely it should go in a separate pip/conda installable package that can depend on pandas, since most of the sampledata modules depends on pandas. Then as a bonus we can also jettison all the s3 sampledata download machinery in the main repo. |
Good plan, in fact I'd like to look at the sample data configuration anyway for pyodide. |
I've made a separate issue for the sampledata, since it seems advised to do regardless |
Problem description
It'd be nice if pandas was an optional dependency with polars as an alternative engine. It has the benefit of being faster than pandas for almost everything and it also would allow polars users to have smaller package sizes by not requiring pandas.
Feature description
Have alternative methods that use polars instead of pandas. There could be a transition phase before all methods are bilingual where, if the input is a polars df but the polars version of the bokeh method hasn't been written yet then the user gets a warning that it's being converted to pandas to finish the task. If they don't have pandas installed they'd, of course, get an error. That warning might motivate more people to contribute to making those methods.
Potential alternatives
live with pandas conversions and dependencies :(
Additional information
No response
The text was updated successfully, but these errors were encountered: