Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Ideas from improving containers #370

Open
mlandis opened this issue Aug 1, 2023 · 1 comment
Open

[Feature Request] Ideas from improving containers #370

mlandis opened this issue Aug 1, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@mlandis
Copy link
Member

mlandis commented Aug 1, 2023

Here are a few ideas of container-based features, borrowed from R and Python, that we might be able to support in Rev.

Data tables

Real world data table files (e.g. csv file) may contain fields of different types (int, float, str, etc.).

RevBayes does not have a data table type, but it can read data table files as two-dimensional vectors. RevBayes behaves well when all values in a csv file can be converted into the same type. For example:

$ cat example1.csv
col1,col2
0.1,0.3
0,0.2

is read as

> x1 = readDataDelimitedFile("example1.csv", delimiter=",", header=true)
> x1
   [ [ 0.1000, 0.3000 ] ,
     0.0000, 0.2000 ] ]
> type(x1)
   MatrixRealPos

However, when a file contains multiple distinct types, the resulting type is a generic RevObject[][]. For example:

> x2 = readDataDelimitedFile("example2.csv", delimiter=",", header=true)
> x2

   RevObject[][] vector with 2 values
   ==================================

   [1]

   RevObject[] vector with 2 values
   ================================

   [1]
   0.1

   [2]
   cat




   [2]

   RevObject[] vector with 2 values
   ================================

   [1]
   0

   [2]
   NA

> type(x2)
   RevObject[][]

Part of the problem comes from relying on a two-dimensional vector to represent the table. Row-vectors must have elements of the same type. That means, any row with different types across columns gets cast to the most generic type, RevObject.

A solution would be to add a DataTable object. This could then store vectors across columns (not rows) while also supporting more advanced ways of indexing (e.g. column names, slice-indexing, etc.).

Example of data table use:

x = readDataTable("my_file.txt", header=true, delimiter=",")
x[1:2, 3]
    height
    3.14
    3.21
x.col[:, ["height", "width"]]
    height  width
    3.14    7.13
    3.21    4.55
    3.77    4.74

Slice-indexing

Currently, we can either access the entire vector or individual vector elements. It should be possible to add basic support for slice-indexing. Current behavior:

> y = [0, 1, 2, 3, 4]
> y[0:3]
   Error:	Argument or label mismatch for function call.
   Provided call:
   [] (Natural[]<constant> 'index' )

   Correct usage is:
   [] (Natural<any> index)

> y[ [0, 1, 2] ]
   Error:	Argument or label mismatch for function call.
   Provided call:
   [] (Natural[]<constant> 'index' )

   Correct usage is:
   [] (Natural<any> index)

Desired behavior:

> y = [0, 1, 2, 3, 4]
> y[0:3]
    [0, 1, 2]

> y[ [0, 1, 2] ]
    [0, 1, 2]

Dictionaries/maps

It'd be nice to be able to use dictionaries or maps as unordered containers. Ideally, keys and values could be of any type. For example:

x = Dictionary()
x["my_tree"] = readTrees("my_tree.tre")[1]
x["my_data"] = readDiscreteCharacterData("my_data.nex")

Dictionaries of containers (vectors or other dictionaries) could be useful, too.

@bredelings
Copy link
Contributor

Pairs and Tuple

Another thing that would be useful to have is tuples. Pairs are a special case: a 2-tuple.

Tuples are different than vectors because each element of a tuple can have a different type, whereas every element of a vector must have the same type.

In c++, we have the type std::pair<T1,T2> for pairs. It would be nice to have the same think in RevBayes.

If we have a type like Vector<Pair<Int,String>>, then this is one way to implement a dictionary. Although not the most efficient.

dict = [("alice",1), ("bob",2)]

Implicitly but strongly typed

Sebastian noted that this combination can be complicated. However, note that languages like Rust are implicitly but strongly typed. So there is a lot of prior art here.

@bjoelle bjoelle added the enhancement New feature or request label Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants