More missing numpy and matlab #649

AngelEzquerra · 2024-04-25T22:04:38Z

This PR adds multiple functions that are available both in numpy and Matlab:

union: returns the unique, unsorted Tensor of values that are found in either of the two input Tensors.
intersection: returns the "intersection" of 2 Tensors as an unsorted rank-1 Tensor.
setDiff: returns the (symmetric or non symmetric) "difference" between 2 Tensors as an unsorted rank-1 Tensor.
toHashSet: convert a Tensor into HashSet.
contains (and thus in and notin).
~~sub2ind and ind2sub: convert "subscript" sequences into linear tensor indexes and vice-versa.~~
almostEqual: missing from nim's std/math.

`union` returns the unique, unsorted Tensor of values that are found in either of the two input Tensors. Note that an equivalent function exists both in `numpy` (where it is called `union1d`) and in `Matlab` (where it is called `union1d`). However, those functions always sort the output, while Arraymancer's version does not. To replicate the same behavior, simply apply `sort` to the output of this function.

Vindaar · 2024-05-07T08:51:50Z

src/arraymancer/laser/tensor/initialization.nim

+  let data = toSeq(a)
+  result = toTensor(data, shape)


I suppose we could also do:

result = newTensorUninit[T](a.card) for i, x in a: result[i] = x

to avoid the double copy (set -> seq -> tensor)

Makes sense. I'll make that change

Interestingly a performance measurement using timeit showed that the original version was faster on Windows. We will leave this as is for now while we investigate what is going on (and if we find out why we'll update this in a separate PR).

Yeah, let's investigate it further after this PR.

Vindaar · 2024-05-07T10:20:01Z

sub2ind is essentially getIndex https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/tensor/private/p_accessors.nim#L60, no? (just with seq input)

Maybe we can reuse the internal logic? In general sub2ind and ind2sub seem very Matlab focused concepts / names? I can see how it can be useful sometimes though. In Julia apparently they have since deprecated these for a CartesianIndex / LinearIndex object helper.

AngelEzquerra · 2024-05-07T18:20:29Z

sub2ind is essentially getIndex https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/tensor/private/p_accessors.nim#L60, no? (just with seq input)

Maybe we can reuse the internal logic? In general sub2ind and ind2sub seem very Matlab focused concepts / names? I can see how it can be useful sometimes though. In Julia apparently they have since deprecated these for a CartesianIndex / LinearIndex object helper.

You are right. Let me remove those from this PR. We can discuss offline and I can create another separate PR later.

This will let us avoid having to convert HashSets into seqs before converting them into tensors. Note that this also improves a little the docstrings of a couple of the existing `toTensor` procedures.

`intersection` returns the "intersection" of 2 Tensors as an unsorted rank-1 Tensor. Note that an equivalent function exists both in `numpy` (where it is called `intersect1d`) and in `Matlab` (where it is called `intersect`). However, those functions always sort the output, while Arraymancer's version does not. To replicate the same behavior, simply apply `sort` to the output of this function. Also note that to implement this feature we moved (and made public) the existing, private toHashSet procedure from spatial/distances.nim into tensor/initialization.nim.

`setDiff` returns the (symmetric or non symmetric) "difference" between 2 Tensors as an unsorted rank-1 Tensor. Note that an equivalent function exists both in `numpy` (where it is called `setdiff1d`) and in `Matlab` (where it is called `setdiff`). However, those functions always sort the output, while Arraymancer's version does not. To replicate the same behavior, simply apply `sort` to the output of this function.

`find` (which is used to implement `contains`) was already supported (since `system.find` is generic and works with Tensors) but was untested, so this also adds a test for it.

This was a useful std/math function that we did not support yet.

AngelEzquerra · 2024-05-07T21:11:12Z

sub2ind is essentially getIndex https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/tensor/private/p_accessors.nim#L60, no? (just with seq input)
Maybe we can reuse the internal logic? In general sub2ind and ind2sub seem very Matlab focused concepts / names? I can see how it can be useful sometimes though. In Julia apparently they have since deprecated these for a CartesianIndex / LinearIndex object helper.

You are right. Let me remove those from this PR. We can discuss offline and I can create another separate PR later.

I've updated the PR as discussed here.

AngelEzquerra · 2024-05-12T10:26:45Z

I just pushed a fix to the original (non gemm) version which solves a problem with convolving a slice (was using unsafe_raw_data where I should have used unsafe_raw_offset). The change also rebases the branch on the latest main branch.

Vindaar · 2024-05-12T13:27:27Z

src/arraymancer/laser/tensor/initialization.nim

+# The folling export is needed to avoid an compilation error in
+# algorithms.nim/intersection() when running the test_algorithms test:
+# `Error: type mismatch - Expression: items(s1)`
+export sets


Note that we could also get away with adding a bind sets.items inside of intersection and setDiff. However, given that we now depend on sets, I don't see an issue exporting it too.

src/arraymancer/laser/tensor/initialization.nim

Fix typo in export comment & add alternative for reader

Vindaar · 2024-05-12T13:53:50Z

Merging, we can look into the toTensor conversion from a HashSet performance later.

AngelEzquerra force-pushed the more_missing_numpy_and_matlab branch 4 times, most recently from 02be765 to 28c8ec1 Compare April 29, 2024 12:27

Vindaar reviewed May 7, 2024

View reviewed changes

AngelEzquerra added 5 commits May 7, 2024 22:45

Add a version of toTensor that takes SomeSet as its input

cdbac42

This will let us avoid having to convert HashSets into seqs before converting them into tensors. Note that this also improves a little the docstrings of a couple of the existing `toTensor` procedures.

Add a contains function (and thus add support for in and notin)

6abfa3e

`find` (which is used to implement `contains`) was already supported (since `system.find` is generic and works with Tensors) but was untested, so this also adds a test for it.

Add support for almostEqual

2c7752b

This was a useful std/math function that we did not support yet.

AngelEzquerra force-pushed the more_missing_numpy_and_matlab branch from 28c8ec1 to 381c5ba Compare May 7, 2024 21:10

AngelEzquerra force-pushed the more_missing_numpy_and_matlab branch from 381c5ba to 2c7752b Compare May 8, 2024 16:23

Vindaar reviewed May 12, 2024

View reviewed changes

src/arraymancer/laser/tensor/initialization.nim Outdated Show resolved Hide resolved

Update src/arraymancer/laser/tensor/initialization.nim

7a478b6

Fix typo in export comment & add alternative for reader

Vindaar merged commit 2c4f2cd into mratsim:master May 12, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More missing numpy and matlab #649

More missing numpy and matlab #649

AngelEzquerra commented Apr 25, 2024 •

edited

Vindaar May 7, 2024

AngelEzquerra May 7, 2024

AngelEzquerra May 7, 2024

Vindaar May 12, 2024

Vindaar commented May 7, 2024

AngelEzquerra commented May 7, 2024

AngelEzquerra commented May 7, 2024

AngelEzquerra commented May 12, 2024 •

edited

Vindaar May 12, 2024

Vindaar commented May 12, 2024

More missing numpy and matlab #649

More missing numpy and matlab #649

Conversation

AngelEzquerra commented Apr 25, 2024 • edited

Vindaar May 7, 2024

Choose a reason for hiding this comment

AngelEzquerra May 7, 2024

Choose a reason for hiding this comment

AngelEzquerra May 7, 2024

Choose a reason for hiding this comment

Vindaar May 12, 2024

Choose a reason for hiding this comment

Vindaar commented May 7, 2024

AngelEzquerra commented May 7, 2024

AngelEzquerra commented May 7, 2024

AngelEzquerra commented May 12, 2024 • edited

Vindaar May 12, 2024

Choose a reason for hiding this comment

Vindaar commented May 12, 2024

AngelEzquerra commented Apr 25, 2024 •

edited

AngelEzquerra commented May 12, 2024 •

edited