ENH: Allow linear_sum_assignment to accept a ufunc for very large graphs #20729

matham · 2024-05-16T18:36:06Z

Is your feature request related to a problem? Please describe.

We are trying to evaluate how different cell detection algorithms find cells. And so want to find the minimum distance pairing between two sets of cells in terms of euclidean distance.

The problem is that even for 250k cells, to use linear_sum_assignment the cost matrix has to be 200+GB assuming float32. Which is too big.

Describe the solution you'd like.

I'd like for linear_sum_assignment to accept something like a ufunc that computes the distance on the fly instead of a cost matrix. Or even better (but obviously less general) if it actually computes the distance between two sets of inputs.

Describe alternatives you've considered.

I thought about hacking up a fake array that computes the distance on every access, but I can't imagine that'll be fast.

Additional context (e.g. screenshots, GIFs)

brainglobe/brainglobe-utils#74.

The text was updated successfully, but these errors were encountered:

dschmitz89 · 2024-05-17T06:20:43Z

I am not familiar with linear_sum_assignment but could sparse matrices be a potential alternative?

matham · 2024-05-17T07:58:04Z

The matrix would be a full matrix. Because there's a (potentially) unique value for each element. So a sparse matrix wouldn't work.

…

On Fri, May 17, 2024, at 2:21 AM, Daniel Schmitz wrote: I am not familiar with `linear_sum_assignment` but could sparse matrices be a potential alternative? — Reply to this email directly, view it on GitHub <#20729 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMRN7VOSYOCRJT72S2QL7DZCWOVBAVCNFSM6AAAAABH2YFXSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWHAZDQOJYGQ>. You are receiving this because you authored the thread.Message ID: ***@***.***>

matham · 2024-05-17T22:24:00Z

What about a proposed function like this:

def linear_sum_assignment(
    array1: np.ndarray, array2: np.ndarray, cost: Optional[Callable],
    callback: Optional[Callable],
) -> tuple[np.ndarray, np.ndarray]:
    pass

Where array1 and array2 are NxK and MxK in size respectively. And cost can be None or a function (or possibly a string?). If it's None (or l2/distance string?) then we compute the L2 norm as the cost using L2(array1[i, :], array2[j, :]). Otherwise we call the function with the input parameters.

And callback is an optional progress callback that is called for each array1 step (for a total of N steps).

Or to start off with, there's no cost arg and L2 is assumed?

I can try my hand at implementing this if this seems acceptable?

matham added the enhancement A new feature or improvement label May 16, 2024

matham mentioned this issue May 16, 2024

[Feature] compare/replace cell matching code with scipy.optimize implementation brainglobe/brainglobe-utils#77

Closed

lucascolley added the scipy.optimize label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow linear_sum_assignment to accept a ufunc for very large graphs #20729

ENH: Allow linear_sum_assignment to accept a ufunc for very large graphs #20729

matham commented May 16, 2024

dschmitz89 commented May 17, 2024

matham commented May 17, 2024 via email

matham commented May 17, 2024 •

edited

ENH: Allow linear_sum_assignment to accept a ufunc for very large graphs #20729

ENH: Allow linear_sum_assignment to accept a ufunc for very large graphs #20729

Comments

matham commented May 16, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context (e.g. screenshots, GIFs)

dschmitz89 commented May 17, 2024

matham commented May 17, 2024 via email

matham commented May 17, 2024 • edited

matham commented May 17, 2024 •

edited