Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The module lacks a 'where' procedure. #486

Open
filipeclduarte opened this issue Jan 3, 2021 · 5 comments
Open

The module lacks a 'where' procedure. #486

filipeclduarte opened this issue Jan 3, 2021 · 5 comments
Labels

Comments

@filipeclduarte
Copy link
Contributor

I was looking for a procedure similar to numpy.where() from python numpy but I cannot find it. So, I think it is beneficial to create a procedure that do this task, i.e., return the indices where the condition is true.

https://github.com/numpy/numpy/blob/1b8b46b3f2f68f5be8e52e798eb91c2ac5952745/numpy/ma/core.py#L7249

@Clonkk
Copy link
Contributor

Clonkk commented Jan 4, 2021

Isn't this a direct application of apply_inline / map_inline ?

var xT = randomTensor[float64]([2, 3], 100.0)
xT.apply_inline:
  if x > 10:
    2*x+5
  else: 
    x*10

@Vindaar
Copy link
Collaborator

Vindaar commented Jan 4, 2021

@Clonkk numpy.where has multiple use cases depending on the arguments (yeah, that's freaking annoying indeed). From the docs https://numpy.org/doc/stable/reference/generated/numpy.where.html:

Note
When only condition is provided, this function is a shorthand for np.asarray(condition).nonzero().

If one only hands the condition it returns the indices at which some condition is true.
Thus, it's neither served by masked/index_select, because these return / use boolean masks nor by apply/map_inline, because one does not want to do anything with the indices.

One can always approximate where in this use case like this:

let t = ... # some tensor t which we want to use `where` for
let cond = ... # our boolean condition
let mask = t.masked_select(cond) # or t[cond]
let wT = toSeq(0 ..< t.size).toTensor()[mask] 

to get the global indices. For 1D tensors that's the same as numpy.where but for ND of course we would want to get indices along each dimension.

The equivalent would be easy to do if we had nonzero. Well, we kind of do, but because I can't finish what I start apparently not. I added it here #447:

proc nonzero*[T](t: Tensor[T]): Tensor[int] =
## Returns the indices, which are non zero as a `Tensor[int]`.
##
## The resulting tensor is 2 dimensional and has one element for each
## dimension in ``t``. Each of those elements contains the indicies along
## the corresponding axis (element 0 == axis 0), which are non zero.
##
## Input:
## - A tensor
##
## Returns:
## - A 2D tensor with N elements, where N is the rank of ``t``
##
## Example:
## .. code:: nim
## let a = [[3, 0, 0],
## [0, 4, 0],
## [5, 6, 0]].toTensor()
## assert a.nonzero == [[0, 1, 2, 2], [0, 1, 0, 1]].toTensor
## # ^-- indices.. ^ ..for axis 0
## # |-- indices for axis 1
## # axis 0: [0, 1, 2, 2] refers to:
## # - 0 -> 3 in row 0
## # - 1 -> 4 in row 1
## # - 2 -> 5 in row 2
## # - 2 -> 6 in row 2
## # axis 1: [0, 1, 0, 1] refers to:
## # - 0 -> 3 in col 0
## # - 1 -> 4 in col 1
## # - 0 -> 5 in col 0
## # - 1 -> 6 in col 1
var count = 0 # number of non zero elements
let mask = map_inline(t):
block:
let cond = x != 0.T
if cond:
inc count
cond
result = newTensor[int]([t.shape.len, count])
var ax = 0 # current axis
var k = 0 # counter for indices in one axis
for idx, x in mask:
if x:
ax = 0
for j in idx:
result[ax, k] = j
inc ax
inc k
but I still haven't gotten around to finishing it (sorry :( ).
I'll finish up what I'm doing now and then wrap up that PR soon.

@Clonkk
Copy link
Contributor

Clonkk commented Jan 4, 2021

Interesting, I've never used np.where that way.

@Vindaar
Copy link
Collaborator

Vindaar commented Jan 4, 2021

Interesting, I've never used np.where that way.

I've only ever used np.where that way, haha.

@filipeclduarte
Copy link
Contributor Author

filipeclduarte commented Jan 4, 2021

@Clonkk numpy.where has multiple use cases depending on the arguments (yeah, that's freaking annoying indeed). From the docs https://numpy.org/doc/stable/reference/generated/numpy.where.html:

Note
When only condition is provided, this function is a shorthand for np.asarray(condition).nonzero().

If one only hands the condition it returns the indices at which some condition is true.
Thus, it's neither served by masked/index_select, because these return / use boolean masks nor by apply/map_inline, because one does not want to do anything with the indices.

One can always approximate where in this use case like this:

let t = ... # some tensor t which we want to use `where` for
let cond = ... # our boolean condition
let mask = t.masked_select(cond) # or t[cond]
let wT = toSeq(0 ..< t.size).toTensor()[mask] 

to get the global indices. For 1D tensors that's the same as numpy.where but for ND of course we would want to get indices along each dimension.

The equivalent would be easy to do if we had nonzero. Well, we kind of do, but because I can't finish what I start apparently not. I added it here #447:

proc nonzero*[T](t: Tensor[T]): Tensor[int] =
## Returns the indices, which are non zero as a `Tensor[int]`.
##
## The resulting tensor is 2 dimensional and has one element for each
## dimension in ``t``. Each of those elements contains the indicies along
## the corresponding axis (element 0 == axis 0), which are non zero.
##
## Input:
## - A tensor
##
## Returns:
## - A 2D tensor with N elements, where N is the rank of ``t``
##
## Example:
## .. code:: nim
## let a = [[3, 0, 0],
## [0, 4, 0],
## [5, 6, 0]].toTensor()
## assert a.nonzero == [[0, 1, 2, 2], [0, 1, 0, 1]].toTensor
## # ^-- indices.. ^ ..for axis 0
## # |-- indices for axis 1
## # axis 0: [0, 1, 2, 2] refers to:
## # - 0 -> 3 in row 0
## # - 1 -> 4 in row 1
## # - 2 -> 5 in row 2
## # - 2 -> 6 in row 2
## # axis 1: [0, 1, 0, 1] refers to:
## # - 0 -> 3 in col 0
## # - 1 -> 4 in col 1
## # - 0 -> 5 in col 0
## # - 1 -> 6 in col 1
var count = 0 # number of non zero elements
let mask = map_inline(t):
block:
let cond = x != 0.T
if cond:
inc count
cond
result = newTensor[int]([t.shape.len, count])
var ax = 0 # current axis
var k = 0 # counter for indices in one axis
for idx, x in mask:
if x:
ax = 0
for j in idx:
result[ax, k] = j
inc ax
inc k

but I still haven't gotten around to finishing it (sorry :( ).
I'll finish up what I'm doing now and then wrap up that PR soon.

@Vindaar this help with rank-1 Tensors. numpy.where is a very useful function. I didn't know all its functionalities.

@mratsim mratsim added the feature label Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants