Changing CDSView filters' attributes does not trigger rendering #11560

p-himik · 2017-12-03T18:36:07Z

p-himik
Dec 3, 2017
Collaborator

Bokeh 0.12.11

from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import CDSView, IndexFilter, ColumnDataSource, Button
from bokeh.plotting import figure

ds = ColumnDataSource(dict(x=[1, 2, 3], y=[1, 2, 3]))
f = IndexFilter(indices=list(range(len(ds.data['x']))))
v = CDSView(source=ds, filters=[f])

fig = figure()
fig.x(x='x', y='y', source=ds, view=v)

b = Button(label='Change indices')


def change_indices():
    if f.indices:
        f.indices = f.indices[:-1]


b.on_click(change_indices)

curdoc().add_root(column(fig, b))

Clicking on the button results in ModelChanged messages being sent out, but the plot itself does not change.
The root cause is that all filter models do not connect any signals - effectively, they're just passive containers.

As of now, a workaround would be to re-set filters property of the CDSView instance.

bryevdv · 2017-12-05T17:54:16Z

bryevdv
Dec 5, 2017
Maintainer

agree this is an issue of missing plumbing, which might be simple to add

cc @clairetang6

0 replies

birdsarah · 2018-04-05T22:32:29Z

birdsarah
Apr 5, 2018

I just ran into this too when trying to update an IndexFilter. Was extremely confusing.

0 replies

clairetang6 · 2018-04-19T01:40:36Z

clairetang6
Apr 19, 2018
Collaborator

The CDSView has a signal set up such that when its filters property changes, it will update its indices and send its own change signal to tell the downstream GlyphRenderer to re-render. The issue is that changing an individual filter's properties doesn't cause the CDSView's filters property to be "changed", since it is still the same list with the same objects (the same filters) inside.

There is one approach to fixing this that I like the idea of, but don't know if it's simple to implement. The idea would be to have the filters themselves be "different" when their properties (such as the indices in an IndexFilter or booleans of BooleanFilter) have changed. This way, the change detection for the CDSView's filters would see the change. In python, this would be easy to implement by giving the filters their own equality definition by overriding __eq__, but I don't know if this is a pattern in Javascript/Typescript.

Another approach to fixing the issue that I like less would be to have the filters send change signals that the CDSView would respond to. The reason I like it less is that that implementation would require the CDSView to keep track of the filters in its filters list to connect and disconnect the signals as filters are added or removed.

0 replies

jimbasquiat · 2018-05-02T01:44:10Z

jimbasquiat
May 2, 2018

The solution proposed by p himik seems not to work for datatable:

from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import CDSView, IndexFilter, ColumnDataSource, Button, DataTable, TableColumn
from bokeh.plotting import figure

ds = ColumnDataSource(dict(x=[1, 2, 3], y=[1, 2, 3]))
f = IndexFilter(indices=list(range(len(ds.data['x']))))
v = CDSView(source=ds, filters=[f])

columns = [
        TableColumn(field='x', title='x', width = 15),
        TableColumn(field='y', title='y', width = 15)]
data_table = DataTable(
    source = ds, 
    columns = columns,
    view = v,
    width=100, 
    height=100)

b = Button(label='Change indices')

def change_indices():
    if f.indices:
        v.filters = [IndexFilter([0])]

b.on_click(change_indices)

curdoc().add_root(column(data_table, b))

(this doesn't update)

It however does work for figure plots: (the following would work):

from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import CDSView, IndexFilter, ColumnDataSource, Button, DataTable, TableColumn
from bokeh.plotting import figure

ds = ColumnDataSource(dict(x=[1, 2, 3], y=[1, 2, 3]))
f = IndexFilter(indices=list(range(len(ds.data['x']))))
v = CDSView(source=ds, filters=[f])

fig = figure()
fig.x(x='x', y='y', source=ds, view=v)

b = Button(label='Change indices')

def change_indices():
    if f.indices:
        v.filters = [IndexFilter([0])]


b.on_click(change_indices)

curdoc().add_root(column(fig, b))

Any idea why?

0 replies

bryevdv · 2018-05-02T03:21:20Z

bryevdv
May 2, 2018
Maintainer

@jimbasquiat, @clairetang6 thoroughly outlined the relevant issues above, no one has had a chance to actually work on connecting more signal handling yet however.

0 replies

dkapitan · 2020-01-03T01:07:43Z

dkapitan
Jan 3, 2020

@clairetang6 I have run into this issue as well. Willing to investigate and see if I can come up with a solution. Have read large part of the relevant code, but can't find/figure out where the signals are actually setup. Could you point me in de right direction, as to where to look in the code base?

UPDATE: I think I have found it, it is in bokehjs (was looking in Python code). Also, thanks @bryevdv for clarifying it on https://discourse.bokeh.org/t/how-to-trigger-customjs-on-data-change/4271/17

0 replies

bryevdv · 2021-09-04T17:54:51Z

bryevdv
Sep 4, 2021
Maintainer

Quick note that plumbing is set up to respond to changes of the filter list, just not of changes to existing filters inside the filters list. I.e. , this kind of swap out works:

def change_indices():
    f = v.filters[0]
    if f.indices:
        v.filters = [IndexFilter(indices=f.indices[:-1])]

But that really should not be necessary. Bumping this to next since it's gotten overlooked

1 reply

muendlein Nov 20, 2021

Replacing filters doesn't seem to triggered any update for DataTables.
Here is a minimal non working example:

import pandas as pd

from bokeh.layouts import row
from bokeh.models import Button, ColumnDataSource, CDSView, DataTable, IndexFilter, TableColumn
from bokeh.plotting import curdoc

col_names = ["col1", "col2", "col3"]
data = [["a", 1, 2], ["b", 3, 4], ["c", 5, 6], ["d", 7, 8]]

df = pd.DataFrame(data)
df.columns = col_names

source = ColumnDataSource(df)
selection = CDSView(source=source, filters=[IndexFilter([0])])

ui_columns = [TableColumn(field=col, title=col) for col in col_names]
table = DataTable(source=source, columns=ui_columns)
table2 = DataTable(source=source, columns=ui_columns, view=selection)

def update_filter():
    #table2.view = CDSView(source=source, filters=[IndexFilter([1])]) #view replacement works
    selection.filters = [IndexFilter(indices=[1])] # those not trigger an update

button = Button(label="Foo", button_type="success")
button.on_click(update_filter)

curdoc().add_root(row(table, button, table2))

bryevdv · 2021-09-04T18:00:35Z

bryevdv
Sep 4, 2021
Maintainer

In retrospect, I rather wish we had simply combined "view" and "filter" since supporting multiple filters per view is a huge complication, as well as nothing anyone has every actually wanted to do in practice in my experience. I'm willing to compromise here to make the common case better. Maybe we can change course for 3.0 e.g something like

p.scatter('x', 'y', source=s, view=GroupFilterView(groups=...))
p.scatter('x', 'y', source=s, view=IndexFilterView(indices=...))
p.scatter('x', 'y', source=s, view=MaskFilterView(mask=...))
p.scatter('x', 'y', source=s, view=CustomJSFilterView(...))

and I would also support convenience methods on CDS to reduce imports and also ensure the correct source is configured on the view:

p.scatter('x', 'y', source=s, view=s.group(...))
p.scatter('x', 'y', source=s, view=s.index(...))
p.scatter('x', 'y', source=s, view=s.mask(...))

That would be much more ergonomic for users. And more importantly, for us, not having three levels of objects to chain changes together for would be a dramatic simplification, and I think it would satisfy the vast majority of use cases. cc @bokeh/dev

Edit: Elaborating on "majority use cases" I think the most common reason to want multiple filters might be to show multiple groups, and I think the simplest way to cater to that is simply to let one GroupFilterView be configured with multiple groups as a list. If folks really do need something more sophisticate there is always CustomJS route. But as things are I think we have made the overwhelmingly common cases sub-adequate in exchange for an attempt at generality that is not needed.

0 replies

p-himik · 2021-09-04T18:30:35Z

p-himik
Sep 4, 2021
Collaborator Author

I would go in a bit different direction. Using both "view" and "filter" in any part of the API makes it much more confusing, so I would just stick to one. If it were up to me, I'd probably choose filter because it's more explicit. A view assumes that anything can be done to the data, but it's absolutely not the case - CDSView only filters the data.

But it would solve only the most superficial problem with the views/filters. There are other, much more fundamental problems:

CDSView has a public state
A lot of components rely on that state, rely on it being set to just the right values at just the right time, thus coupling things greatly and allowing for a whole family of bugs and inconsistencies (this issue, plus data tables seem to have it as well, judging by the code; I won't bet that there are no other such locations)
Everything or almost everything that needs to have a data source, now also needs to have a view. The amount of required care has risen dramatically

The proposition - hide all the complexity of the source+view+filter unholy trinity from their clients. Instead, create something like a FilteredSource that just wraps the original source and is a proper source for all intents and purposes of the components that end up using it. It will be completely opaque to its users - they will never know that there's even a filter there (of course, one must not check typeof source - that would negate the whole point). It will still be the same source underneath, all the data changes will still be propagated, all the signals will still (or finally?) be connected automatically just because everything already cares about sources and FilteredSource would take care of connecting the filters.
What do you think?

0 replies

bryevdv · 2021-09-04T18:38:39Z

bryevdv
Sep 4, 2021
Maintainer

@p-himik I think we agree 100% on the problems, but I am not so sure about the solution. I do think there is value in having the view be something that is configured on a glyph (or table). It's easy for me to imagine a situation where a user wants to show different views of one data source by adding different views to different glyphs. Unlike multiple filters on one glyph, that scenario seems perfectly common and reasonable. And it's conceptually a model that makes sense to me. But if we couple the view and the filter and the source then all the sudden we have a multiplicity of source types (which I don't like to begin with), but also users would have to create different source objects for different which opens up awhole host of questions about data ownership semantics, etc. both from an implementation side and also a user confusion side.

Just by way of comparison:

s =  CDS() # one data set to drive multiple glyphs

p.scatter('x', 'y', source=s, view=s.group(...))
p.scatter('x', 'y', source=s, view=s.index(...))
p.scatter('x', 'y', source=s, view=s.mask(...))

vs (IIUC)

s =  CDS() # one data set to drive multiple glyphs

p.scatter('x', 'y', source=GroupSource(s,...))
p.scatter('x', 'y', source=IndexSource(s,...))
p.scatter('x', 'y', source=maskSource(s,...))

I think sources are already laden with enough meaning and baggage without tryin to parse what it means to configure one source with another source (which would still be required). And to afford a "custom js" version it would be CustomJSSource which is tortuous to me in terms of meanings.

Everything or almost everything that needs to have a data source, now also needs to have a view. The amount of required care has risen dramatically

Everything in this case is: tables and data renderers. I do believe we now have lots of experience built up to handle the event plumbing for just a single view object, or its properties, changing. That general scenario is common enough. Where we got into trouble is eventing for a view, or list of filters on the view, or the filters themselves, or any of the properties on any of the filters, all of which could potentially change at any time or even in principle simultaneously .

0 replies

p-himik · 2021-09-04T19:23:54Z

p-himik
Sep 4, 2021
Collaborator Author

a situation where a user wants to show different views of one data source by adding different views to different glyphs

And your examples show how this is possible, so it's not an issue. My proposal doesn't break any existing functionality.

Unlike multiple filters on one glyph, that scenario seems perfectly common and reasonable.

I would argue that multiple filters are also useful. And are even needed, to not make the users reinvent the wheel.
I use them in my projects. One example - a plot with multiple filtering widgets. Each widget controls its own filter. New widgets can be added at runtime to avoid cluttering the UI. I wouldn't want to see this functionality go away and to have redo it all myself in a CustomJSFilter or whatever it ends up being.

all the sudden we have a multiplicity of source types (which I don't like to begin with)

Not of source types but of source wrappers. Right now we have a multiplicity of filters. You offer a multiplicity of views. How is multiplicity of source wrappers different from that?

p.scatter('x', 'y', source=s, view=s.group(...))

Don't do this. This is tight, unnecessary coupling. Now the very basic, core model ColumnDataSource knows about all kinds of filters Bokeh users might want.

p.scatter('x', 'y', source=GroupSource(s,...))

This is miles better. It doesn't couple anything - GroupSource is just a thin wrapper. GroupSource(...) is absolutely not different from s.group(...) - it's just a few more characters. And it clearly shows that there's a new model, thus making it more explicit - which is usually better.
Compared to s.group, it is also extendable from the outside in a very clear way - it's just a custom new model. A user won't even think about trying to add a new function to the existing ColumnDataSource class or to create a new data source class that inherits from CDS. We can even have MultipleFiltersSource that has the current behavior of a view with multiple filters.

what it means to configure one source with another source (which would still be required).

Just to reiterate - I'm proposing wrappers, not fully fledged new sources. They will be sources only according to the LSP ("is a") and not to its actual implementation. (perhaps streaming and patching should be prohibited for such wrappers, so not the truest of "is a")

users would have to create different source objects for different [glyphs] which opens up awhole host of questions about data ownership semantics, etc

But we already have that problem..? CDSView requires source, and it must be the same source as the one you set on the glyph. I have no idea if there are checks for that, but I'm pretty sure that providing different sources would be considered an undefined behavior.
So why exactly is the proposed solution more confusing than what users have to do with CDSView or with extra filters? If anything, it's less confusing - because one would have to pass only one source to the glyphs and other models and not a source and a view with its own source that's actually the same one.

Everything in this case is: tables and data renderers

There's also bokeh.models.transforms.Interpolator. It doesn't know about views now, but ideally it should - interpolate while ignoring outliers is a perfect use case.
There are also bokeh.models.annotations.LabelSet and all subclasses of bokeh.models.annotations.DataAnnotation - same thing.

With a data source wrapper, everything that needs a source will get automatic support for filtering - the source code of any model that needs a data source won't have to be changed. Including all the possible future models and custom user models - for free! Isn't that neat?

we now have lots of experience built up to handle the event plumbing for just a single view object

Experience reduces the probability of messing things up. Removing all the places where things can go wrong simply nullifies that probability.
Apart from that, experience is not something that stays with a project - code is.

Where we got into trouble is eventing for a view, or list of filters on the view, or the filters themselves, or any of the properties on any of the filters, all of which could potentially change at any time or even in principle simultaneously

And to me, this list seems much longer than it could've been. And it's even a bit shorter than it actually is - you didn't mention the timing coupling: #7299

0 replies

bryevdv · 2021-09-04T22:21:59Z

bryevdv
Sep 4, 2021
Maintainer

Don't do this. This is tight, unnecessary coupling. Now the very basic, core model ColumnDataSource knows about all kinds of filters Bokeh users might want.

This is purely syntactic sugar:

def group(self, **kw): 
    return GroupFilterView(self, **kw)

It's an affordance for common cases, nothing more, and does not get int he way of extension filters or customjs filters or just doing things manually if that's what someone prefers:

view = SomeFilterView(s, ...)

it's just a few more characters

It's a few more characters plus an entire import and I can't stress enough that Bokeh already gets dinged for being to verbose in this regard, compared to other libraries.

Not of source types but of source wrappers. Right now we have a multiplicity of filters. You offer a multiplicity of views. How is multiplicity of source wrappers different from that?

Wrappers are definitely more confusing in my opinion. Many users are not familiar with the concept at all.

glyph has a source, and optionally a view on that source

is much easier to explain than

glyph has a source, but that source can also be composed via function application to a some facade wrapper that looks like a source, but actually isn't

But we already have that problem..? CDSView requires source, and it must be the same source as the one you set on the glyph. I have no idea if there are checks for that

There are already validation checks exactly for this. Alerting users to mis-configuration is not an issue.

So why exactly is the proposed solution more confusing than what users have to do with CDSView or with extra filters?

Are we talking each other? I don't think your proposed solution is more confusing that the current things users have to do. But I do think encumbering sources is more confusing than maintaining a separation between a source and a view on a source. I have ~10 years of experience having to explain all the various things about CDS, and I very strongly don't want to encumber the "data source" concept itself with even more baggage. Furthermore I think the notion of a separate view on to something as its own things is very well established in general. We can rely on that general knowledge to present and explain our particular implementation.

There are also bokeh.models.annotations.LabelSet and all subclasses of bokeh.models.annotations.DataAnnotation - same thing.

Those are data renderers.

And to me, this list seems much longer than it could've been. And it's even a bit shorter than it actually is - you didn't mention the timing coupling: #7299

That's my point! I stopped at a reasonable approximation and it's already too much. Bu my further point is that what I am proposing get's down to waht has already been demonstrated as manageable.

Here is an another reasonable use case:

I want to update source.data = new_data while keeping the same group filter.

That seems pretty reasonable to want. How does that look in your proposal. There is not currently any actual property delegation mechanism so wrapper.data = new_data is not feasible without alot of up front ancillary work. Changing actual sources (so also presumably wrappers on sources) is a heavy weight operation we actively discourage. So, technically speaking, the process is exactly the same, just source.data = new_data But now source is some internally configured thing on the wrapper, and not the thing that was configured on the glyph. That is confusing.

0 replies

p-himik · 2021-09-04T22:41:14Z

p-himik
Sep 4, 2021
Collaborator Author

plus an entire import

I see what you mean, and I guess it's just a detriment of "the Python way" - everywhere you look in the Python documentation, the objects are imported directly and not used via an alias. Nowadays I try to stick to something like this: import bokeh.models as m. Then I don't have to change the imports section at all if I need to use something - I just use m.Model and whatnot.

[...]
is much easier to explain than
[...]

I would argue the opposite because that's exactly how Pandas does it, albeit with the same affordances - iloc and such are members of a DataFrame and they create a view into a data frame. There are no explicit views there at all.

There are already validation checks exactly for this. Alerting users to mis-configuration is not an issue.

Then why do you think it will create a host of new questions? The situation is exactly the same, only simpler - the same source will have to be mentioned only once per glyph, not twice.

How does that look in your proposal

As simple as this:

ds = ColumnDataSource(...)
p.circle(source=GroupFilterSource(source=ds, column_name='a', group='b'))
[...]
ds.source = new_data

The glyph doesn't care - only the wrapper does. It does all the job.
The wrapper source models will simply listen to any changes to ds and will recalculate their indices - notifying each and every listener that their data, in turn, has changed. And of course their data will be not a regular member but a property that returns the data according to the cached indices. Nothing in the outer world will know that it's not the real data and that there's some cache in there.

0 replies

bryevdv · 2021-09-04T22:45:23Z

bryevdv
Sep 4, 2021
Maintainer

you replied while I was editing :)

So, technically speaking, the process is exactly the same, just source.data = new_data But now source is some internally configured thing on the wrapper, and not the thing that was configured on the glyph. That is confusing.

I should have added "to me". I'll have to sit with this for some time @p-himik I still don't see the advantages but you clearly feel strongly about it so it deserves more careful consideration. As an aside, this should probably move to a discussion. Unless you object, since there is already considerable discussion here, I'd just convert this issue to a discussion and open a new issue referencing the discussion.

0 replies

p-himik · 2021-09-04T22:48:24Z

p-himik
Sep 4, 2021
Collaborator Author

No objections! To be honest, I don't really have a clue whether such a conversion entails something serious, but I'm curious to see how it works out.

0 replies

bryevdv · 2021-09-05T06:18:44Z

bryevdv
Sep 5, 2021
Maintainer

It looks like #11439 is actually a due of this, so I will convert this to a discussion and leave #11439 open to link to eventual PR work.

0 replies

p-himik · 2021-09-05T06:51:59Z

p-himik
Sep 5, 2021
Collaborator Author

Accidentally found another related issue: #8023
A use-case that's distinct from glyphs and timing issues, and would also be solved automatically by data being filtered at the data source level.

0 replies

p-himik · 2021-09-05T11:34:52Z

p-himik
Sep 5, 2021
Collaborator Author

Here's a rough POC. Most of the TypeScript code is just slightly modified code from CDSView.
I would like to extend it further so it includes:

Runtime filter changing (both instances and arguments)
Data points selection
Data tables and other non-glyph things

Let me know if you'd like to see anything else in addition to that.
The hope is that demonstrating it all doesn't even require any changes to Bokeh itself, although a proper solution would still ideally involve some changes to simplify quite a few things.
I should have more time for this the next week.

from random import random, choices

import bokeh.models as m
from bokeh.core.property.container import Seq
from bokeh.core.property.instance import Instance
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.transform import dodge
from bokeh.util.compiler import TypeScript

# language=TypeScript
TS_CODE = """\
import * as p from "core/properties"
import {Arrayable, Indices} from "core/types"
import {Filter} from "models/filters/filter"
import {ColumnarDataSource} from "models/sources/columnar_data_source";

export namespace FilteredDataSource {
  export type Attrs = p.AttrsOf<Props>
  
  export type Props = ColumnarDataSource.Props & {
    filters: p.Property<Filter[]>
    source: p.Property<ColumnarDataSource>
    data: p.Property<{[key: string]: Arrayable}>
  }
}

export interface FilteredDataSource extends FilteredDataSource.Attrs {}

export class FilteredDataSource extends ColumnarDataSource {
  properties: FilteredDataSource.Props
  private _dirty: boolean = true
  private _data: {[key: string]: Arrayable}
  
  constructor(attrs?: Partial<FilteredDataSource.Attrs>) {
    super(attrs)
  }
  
  static init_FilteredDataSource(): void {
    this.define<FilteredDataSource.Props>(({Array, Ref}) => ({
      filters: [Array(Ref(Filter)), []],
      source: [Ref(ColumnarDataSource)],
    }))
  }
  
  connect_signals(): void {
    super.connect_signals()
    const on_change = () => {
      this._dirty = true
      this.change.emit()
    }
    this.connect(this.properties.filters.change, on_change)
    this.connect(this.properties.source.change, on_change)
    this.connect(this.source.change, on_change)
    this.connect(this.source.streaming, () => {
      this._dirty = true
      this.streaming.emit()
    })
    this.connect(this.source.patching, (indices) => {
      this._dirty = true
      this.patching.emit(indices)
    })
  }
  
  compute_filtered_data(): void {
    const {source} = this
    const size = source.get_length() ?? 1
    const indices = Indices.all_set(size)
    
    for (const filter of this.filters) {
      indices.intersect(filter.compute_indices(source))
    }
    
    this._dirty = false
    this._data = Object.keys(this.source.data).reduce((data, k) => {
      data[k] = indices.select(this.source.data[k])
      return data
    }, {} as {[key: string]: Arrayable})
  }
  
  // Needed to avoid any modifications to `ColumnarDataSource` to make this example work.
  // @ts-ignore
  get data(): {[key: string]: Arrayable} {
    if (this._dirty) {
      this.compute_filtered_data()
    }
    return this._data 
  }
  
  set data(_: {[key: string]: Arrayable}) {
    throw new Error('FilteredDataSource instances do not suppot setting data - set data on the original source instead.')
  }
}
"""


class FilteredDataSource(m.ColumnarDataSource):
    filters = Seq(Instance(m.Filter))
    source = Instance(m.ColumnarDataSource)

    __implementation__ = TypeScript(TS_CODE)


N = 100
ds = m.ColumnDataSource(dict(x=list(range(N)),
                             y=list(range(N)),
                             z=[random() for _ in range(N)],
                             s=choices('abcd', k=N)))
p = figure()
p.circle('x', 'y', color='gray',
         source=ds)
p.circle('x', dodge('y', 10), color='blue',
         source=FilteredDataSource(source=ds,
                                   filters=[m.BooleanFilter(booleans=[i % 2 == 0 for i in range(N)]),
                                            m.IndexFilter(indices=[idx for idx, z in enumerate(ds.data['z']) if z > 0.5])]))
p.triangle('x', dodge('y', -10), color='green',
           source=FilteredDataSource(source=ds,
                                     filters=[m.GroupFilter(column_name='s', group='a')]))

show(p)

@bryevdv You mentioned that label sets and data annotations are, quote, "data renderers". What does it mean exactly? They accept a data source - why would we not want to be able to filter them as well?

0 replies

p-himik · 2021-09-09T19:55:39Z

p-himik
Sep 9, 2021
Collaborator Author

Here's an updated example that implements most of the things I mentioned, and some more:

Runtime filter changing (both instances and arguments)
Data points selection - alas, can't make it work without some changes to Bokeh
Data tables and other non-glyph things - a data table, whiskers, a label set
Runtime data source changing - a bonus item
Proper legends - a bonus item that fixes color mapping in legend does not take current view (CDSView) into account and color mapping in circle fill is wrong after subsampling with box_select from another plot #8010
Working expressions, namely cumsum - a bonus item that fixes transform.cumsum doesn't take into account the CDSView #8023. Well, almost - details below.

Regarding data points selection/inspection - Bokeh modifications are need in order to:

Reduce coupling of multiple components with CDSView and SelectionManager
Move index mapping from CDSView to some other place, probably SelectionManager or somewhere nearby

Regarding runtime data source changing - I know, doesn't sound like something useful. But actually a use-case is easy to imagine - switching between local data and data provided by an instance of WebDataSource. Or switching between multiple instances of WebDataSource. And in general, it might be easier to organize data as multiple data sources as opposed to a single DS with an extra field for which you would need to create and manage an extra filter. Just as with Pandas - if you use DataFrames with a low number of static categories, it often makes more sense to have them as separate objects rather than as a single giant data frame with an extra column.

Regarding working expressions - unfortunately, #7299 still prevents it from working properly, so the relevant code is commented out in the example. It does work, but breaks as soon as you make the filter less specific. It would work properly if we move expressions from the level of fields to the level of, you guessed it, another data source wrapper. :) So in the end, we would have a pipeline of data transformation that's fed to the views at the very end. Given the state of the example right now, the idea is definitely growing on me. If we forget about the existing Bokeh code for a second, isolating data providers from consumers and thus making glyphs the simplest they could be sounds pretty good.

I also want to mention that after completing this example I, no joke, sat there for a minute and marveled at how utterly simple the code of create_fds_row is. Not because I wrote it course - but because there's nothing to remove. Every single piece of it deals not with some obscure Bokeh peculiarity, like having to call emit() in some very particular cases, or having to completely throw away and recreate a model, but rather with the task at hand.
And for any new model that needs to take action when its data changes, the process of delivering the current data will be automatic - with zero need for plumbing writing and maintenance.

from random import random, choices

import bokeh.models as m
from bokeh.core.property.container import Seq
from bokeh.core.property.instance import Instance
from bokeh.io import show
from bokeh.layouts import row, column
from bokeh.plotting import figure
from bokeh.transform import dodge, transform, cumsum, factor_cmap
from bokeh.util.compiler import TypeScript
from bokeh.palettes import Viridis

# language=TypeScript
TS_CODE = """\
import * as p from "core/properties"
import {Arrayable, Indices} from "core/types"
import {Filter} from "models/filters/filter"
import {ColumnDataSource} from "models/sources/column_data_source";
import * as kinds from "core/kinds"


export namespace FilteredDataSource {
  export type Attrs = p.AttrsOf<Props>
  
  export type Props = ColumnDataSource.Props & {
    filters: p.Property<Filter[]>
    source: p.Property<ColumnDataSource>
    data: p.Property<{[key: string]: Arrayable}>
  }
}

export interface FilteredDataSource extends FilteredDataSource.Attrs {}

export class FilteredDataSource extends ColumnDataSource {
  properties: FilteredDataSource.Props
  private _dirty: boolean = true
  private _data: {[key: string]: Arrayable}
  private _old_filter_connections: [any, any][] = []
  private _old_source_connections: [any, any][] = []
  
  constructor(attrs?: Partial<FilteredDataSource.Attrs>) {
    super(attrs)
  }
  
  static init_FilteredDataSource(): void {
    this.define<FilteredDataSource.Props>(({Array, Ref}) => ({
      filters: [Array(Ref(Filter)), []],
      source: [Ref(ColumnDataSource)],
    }))
    // Defining the `data` property manually.
    // Needed to both have the property and have custom logic for its retrieval/assignment.
    const name = 'data'
    Object.defineProperty(this.prototype, name, {
      get(this: FilteredDataSource): any {
        if (this._dirty) {
          this.compute_filtered_data()
        }
        return this._data 
      },
      set(this: FilteredDataSource, _value: any): FilteredDataSource {
        throw new Error('FilteredDataSource instances do not suppot setting data - set data on the original source instead.')
      },
      configurable: false,
      enumerable: true,
    })
    const refined_prop = {
      type: kinds.Dict(kinds.Any) as any,
      default_value: () => {return {}},
      options: {},
    }

    const props = {...this.prototype._props}
    props[name] = refined_prop
    this.prototype._props = props
  }
  
  _mark_dirty(): void {
    this._dirty = true
    this.change.emit()
  }
  
  _reconnect_filters(): void {
    const mark_dirty = this._mark_dirty.bind(this)
    for (const [signal, slot] of this._old_filter_connections) {
      this.disconnect(signal, slot)
    }
    this._old_filter_connections = []
    for (const f of this.filters) {
      this.connect(f.change, mark_dirty)
      this._old_filter_connections.push([f.change, mark_dirty])
    }
  }
  
  _reconnect_source(): void {
    for (const [signal, slot] of this._old_source_connections) {
      this.disconnect(signal, slot)
    }
    this._old_source_connections = []
    
    const mark_dirty = this._mark_dirty.bind(this)
    this.connect(this.source.change, mark_dirty)
    this._old_source_connections.push([this.source.change, mark_dirty])
    
    this.connect(this.source.properties.data.change, this.properties.data.change.emit)
    this._old_source_connections.push([this.source.properties.data.change, this.properties.data.change.emit])
    
    const on_streaming = () => {
      this._dirty = true
      this.streaming.emit()
    }
    this.connect(this.source.streaming, on_streaming)
    this._old_source_connections.push([this.source.streaming, on_streaming])
    
    const on_patching = (indices: number[]) => {
      this._dirty = true
      this.patching.emit(indices)
    }
    this.connect(this.source.patching, on_patching)
    this._old_source_connections.push([this.source.patching, on_patching])
  }
  
  connect_signals(): void {
    super.connect_signals()
    const mark_dirty = this._mark_dirty.bind(this)
    
    this.connect(this.properties.filters.change, () => {
      this._reconnect_filters()
      mark_dirty()
    })
    this._reconnect_filters()
    
    this.connect(this.properties.source.change, () => {
      this._reconnect_source()
      mark_dirty()
    })
    this._reconnect_source()
  }
  
  compute_filtered_data(): void {
    const {source} = this
    const size = source.get_length() ?? 1
    const indices = Indices.all_set(size)
    
    for (const filter of this.filters) {
      indices.intersect(filter.compute_indices(source))
    }
    
    this._dirty = false
    this._data = Object.keys(this.source.data).reduce((data, k) => {
      data[k] = indices.select(this.source.data[k])
      return data
    }, {} as {[key: string]: Arrayable})
  }
}
"""


class FilteredDataSource(m.ColumnDataSource):
    filters = Seq(Instance(m.Filter))
    source = Instance(m.ColumnDataSource)

    @property
    def column_names(self):
        return self.source.column_names

    __implementation__ = TypeScript(TS_CODE)


N = 100
S_CHOICES = 'abcd'
DODGE = 10
ds = m.ColumnDataSource(dict(x=list(range(N)),
                             y=list(range(N)),
                             z=[random() for _ in range(N)],
                             s=choices(S_CHOICES, k=N),
                             s2=choices(S_CHOICES, k=N)))

ds2 = m.ColumnDataSource(dict(x=list(range(N)),
                              y=list(reversed(range(N))),
                              z=[random() for _ in range(N)],
                              s=choices(S_CHOICES, k=N),
                              s2=choices(S_CHOICES[0], k=N)))


def create_fds_row():
    cmap = factor_cmap('s', factors=list(S_CHOICES), palette=Viridis[len(S_CHOICES)])

    p1 = figure(tools='box_select', x_range=(0, N), y_range=(0, N + DODGE))
    p1.triangle('x', 'y', color=cmap, source=ds, legend_label='Unfiltered data')

    head_filter = m.IndexFilter(indices=list(range(min(10, N))))
    remove_odd_filter = m.BooleanFilter(booleans=[i % 2 == 0 for i in range(N)])
    remove_every_third_filter = m.BooleanFilter(booleans=[i % 3 == 0 for i in range(N)])
    z_filter = m.CustomJSFilter(args=dict(min_z=0, max_z=1),
                                # language=JavaScript
                                code="return source.data.z.map(z => (min_z <= z && z <= max_z));")
    s_filter = m.GroupFilter(column_name='s', group=S_CHOICES[0])
    complex_filtered_ds = FilteredDataSource(source=ds, filters=[])

    p1.circle('x', dodge('y', DODGE), color=cmap, legend_field='s',
              source=complex_filtered_ds)

    def mk_whisker_transform(for_field):
        n = 4
        sign = '+' if for_field == 'upper' else '-'
        code = f"""
            const ys = ds.data.y;
            return xs.map((x, idx) => ys[idx] {sign} (x * {n}) + {DODGE});
        """
        return transform('z', m.CustomJSTransform(args=dict(ds=complex_filtered_ds),
                                                  v_func=code))

    p1.add_layout(m.Whisker(lower=mk_whisker_transform('lower'),
                            upper=mk_whisker_transform('upper'),
                            base='x',
                            source=complex_filtered_ds))

    p2 = figure(tools='', x_range=(0, N), y_range=(0, 1))
    p2.circle('x', 'z', color=cmap,
              source=complex_filtered_ds)

    p2.add_layout(m.LabelSet(x='x', y='z', text='s', x_offset=3, y_offset=3,
                             source=complex_filtered_ds))

    # Does not work because of #7299.
    # p3 = figure(x_range=S_CHOICES)
    # for s in S_CHOICES:
    #     p3.vbar(x='s2', width=0.5,
    #             bottom=cumsum('z', include_zero=True),
    #             top=cumsum('z'),
    #             color=cmap,
    #             # Nesting sources!!!
    #             source=FilteredDataSource(source=complex_filtered_ds,
    #                                       filters=[m.GroupFilter(column_name='s2', group=s)]))

    t = m.DataTable(columns=[m.TableColumn(field='x', title='X'),
                             m.TableColumn(field='y', title='Y'),
                             m.TableColumn(field='z', title='Z'),
                             m.TableColumn(field='s', title='S')],
                    source=complex_filtered_ds)

    ds_selector = m.RadioButtonGroup(labels=['Data 1', 'Data 2'], active=0)
    ds_selector.js_on_change('active', m.CustomJS(args=dict(sources=[ds, ds2], filtered_source=complex_filtered_ds),
                                                  # language=JavaScript
                                                  code="filtered_source.source = sources[cb_obj.active];"))

    z_slider = m.RangeSlider(title='Z filter', value=(0, 1), start=0, end=1, step=0.01,
                             disabled=True)
    z_slider.js_on_change('value', m.CustomJS(args=dict(z_filter=z_filter),
                                              # language=JavaScript
                                              code="""
                                                  const [min_z, max_z] = cb_obj.value;
                                                  z_filter.args = {min_z, max_z};
                                              """))

    s_selector = m.RadioGroup(labels=list(S_CHOICES), active=S_CHOICES.index(s_filter.group),
                              disabled=True)
    s_selector.js_on_change('active', m.CustomJS(args=dict(s_filter=s_filter),
                                                 # language=JavaScript
                                                 code="s_filter.group = cb_obj.labels[cb_obj.active];"))

    filter_selector = m.CheckboxButtonGroup(
        labels=['Show only first 10', 'Remove odd', 'Remove every third', 'Filter by Z', 'Filter by S']
    )
    filter_selector.js_on_change('active', m.CustomJS(
        args=dict(filters=[head_filter, remove_odd_filter, remove_every_third_filter, z_filter, s_filter],
                  filter_widgets=[None, None, None, z_slider, s_selector],
                  filtered_ds=complex_filtered_ds),
        # language=JavaScript
        code="""
            filtered_ds.filters = cb_obj.active.map(idx => filters[idx]);
            const active = new Set(cb_obj.active);
            filter_widgets.forEach((w, idx) => {
                if (w) {
                    w.disabled = !active.has(idx);
                }
            });
        """))

    return row(p1, p2,  # p3,
               column(t,
                      ds_selector,
                      filter_selector,
                      z_slider,
                      row(m.Div(text='S filter', render_as_text=True), s_selector)))


show(create_fds_row())

3 replies

bryevdv Sep 9, 2021
Maintainer

@p-himik I ran this right quick and it is a very impressive demonstration. I will look more closely at the technical details next week after the 2.4 release, but I'll share two quick thoughts

I think our two proposals are actually not that different, in that they require associating the same components (a source, and filters on that source). Really the main thing is just defining where and how that association happens, and there is a tidyness to the above. If we added the ability for FilteredDataSource to accept a data dict and construct its own CDS that might go a ways towards my concerns about burdening users with imports.
It also occurred to me that if someone is willing to do the lifting, I should probably just get out of their way :)

p-himik Sep 9, 2021
Collaborator Author

If we added the ability for FilteredDataSource to accept a data dict and construct its own CDS that might go a ways towards my concerns about burdening users with imports

Oh, compared to the other changes, this is a very small concern that is trivially fixed. Just like with the whole plotting API, there can be a plethora of helping functions that make things more convenient. I'm not sure about specifically making FilteredDataSource storing raw data because it will worsen the architecture by smudging the concerns. But some function, maybe even a static member of FilteredDataSource, could definitely create a hidden CDS instance for the user.

if someone is willing to do the lifting, I should probably just get out of their way :)

The thing is, if there were nobody in my way, I would change a lot. :) Well, if I also had infinite time.
So just because I (or anyone else) want to do the lifting, doesn't mean that that lifting is useful - we still need to discuss things.
But given that we're already on the same page, at least to some extent, I'll try to figure out how to approach this particular topic of filtering/transforming data sources in a gradual way. And the first step would be decoupling CDSView and SelectionManager from some of the components, so that FilteringDataSource has some space to breathe.

bryevdv Sep 9, 2021
Maintainer

I'm not sure about specifically making FilteredDataSource storing raw data

Sorry if I was confusing, I am definitely not suggesting that. I mean only that it can create a CDS for you automatically, in cases where you wouldn't otherwise need to create one up front separately. i.e. there would always be FDS.source just as you have above.

we still need to discuss things.

Absolutely :) I only mean that on the spectrum of possibilities that it's better for me to aim for sounding board and facilitator, rather than gatekeeper.

But given that we're already on the same page, at least to some extent, I'll try to figure out how to approach this particular topic of filtering/transforming data sources in a gradual way.

👍 look forward to seeing what you come up with!

Changing CDSView filters' attributes does not trigger rendering #11560

p-himik Dec 3, 2017 Collaborator

Replies: 19 comments · 4 replies

bryevdv Dec 5, 2017 Maintainer

birdsarah Apr 5, 2018

clairetang6 Apr 19, 2018 Collaborator

jimbasquiat May 2, 2018

bryevdv May 2, 2018 Maintainer

dkapitan Jan 3, 2020

bryevdv Sep 4, 2021 Maintainer

muendlein Nov 20, 2021

bryevdv Sep 4, 2021 Maintainer

p-himik Sep 4, 2021 Collaborator Author

bryevdv Sep 4, 2021 Maintainer

p-himik Sep 4, 2021 Collaborator Author

bryevdv Sep 4, 2021 Maintainer

p-himik Sep 4, 2021 Collaborator Author

bryevdv Sep 4, 2021 Maintainer

p-himik Sep 4, 2021 Collaborator Author

bryevdv Sep 5, 2021 Maintainer

p-himik Sep 5, 2021 Collaborator Author

p-himik Sep 5, 2021 Collaborator Author

p-himik Sep 9, 2021 Collaborator Author

bryevdv Sep 9, 2021 Maintainer

p-himik Sep 9, 2021 Collaborator Author

bryevdv Sep 9, 2021 Maintainer

p-himik
Dec 3, 2017
Collaborator

Replies: 19 comments 4 replies

bryevdv
Dec 5, 2017
Maintainer

birdsarah
Apr 5, 2018

clairetang6
Apr 19, 2018
Collaborator

jimbasquiat
May 2, 2018

bryevdv
May 2, 2018
Maintainer

dkapitan
Jan 3, 2020

bryevdv
Sep 4, 2021
Maintainer

bryevdv
Sep 4, 2021
Maintainer

p-himik
Sep 4, 2021
Collaborator Author

bryevdv
Sep 4, 2021
Maintainer

p-himik
Sep 4, 2021
Collaborator Author

bryevdv
Sep 4, 2021
Maintainer

p-himik
Sep 4, 2021
Collaborator Author

bryevdv
Sep 4, 2021
Maintainer

p-himik
Sep 4, 2021
Collaborator Author

bryevdv
Sep 5, 2021
Maintainer

p-himik
Sep 5, 2021
Collaborator Author

p-himik
Sep 5, 2021
Collaborator Author

p-himik
Sep 9, 2021
Collaborator Author

bryevdv Sep 9, 2021
Maintainer

p-himik Sep 9, 2021
Collaborator Author

bryevdv Sep 9, 2021
Maintainer