[Feature Request] Ideas from improving containers #370

mlandis · 2023-08-01T17:02:51Z

Here are a few ideas of container-based features, borrowed from R and Python, that we might be able to support in Rev.

Data tables

Real world data table files (e.g. csv file) may contain fields of different types (int, float, str, etc.).

RevBayes does not have a data table type, but it can read data table files as two-dimensional vectors. RevBayes behaves well when all values in a csv file can be converted into the same type. For example:

$ cat example1.csv
col1,col2
0.1,0.3
0,0.2

is read as

> x1 = readDataDelimitedFile("example1.csv", delimiter=",", header=true)
> x1
   [ [ 0.1000, 0.3000 ] ,
     0.0000, 0.2000 ] ]
> type(x1)
   MatrixRealPos

However, when a file contains multiple distinct types, the resulting type is a generic RevObject[][]. For example:

> x2 = readDataDelimitedFile("example2.csv", delimiter=",", header=true)
> x2

   RevObject[][] vector with 2 values
   ==================================

   [1]

   RevObject[] vector with 2 values
   ================================

   [1]
   0.1

   [2]
   cat




   [2]

   RevObject[] vector with 2 values
   ================================

   [1]
   0

   [2]
   NA

> type(x2)
   RevObject[][]

Part of the problem comes from relying on a two-dimensional vector to represent the table. Row-vectors must have elements of the same type. That means, any row with different types across columns gets cast to the most generic type, RevObject.

A solution would be to add a DataTable object. This could then store vectors across columns (not rows) while also supporting more advanced ways of indexing (e.g. column names, slice-indexing, etc.).

Example of data table use:

x = readDataTable("my_file.txt", header=true, delimiter=",")
x[1:2, 3]
    height
    3.14
    3.21
x.col[:, ["height", "width"]]
    height  width
    3.14    7.13
    3.21    4.55
    3.77    4.74

Slice-indexing

Currently, we can either access the entire vector or individual vector elements. It should be possible to add basic support for slice-indexing. Current behavior:

> y = [0, 1, 2, 3, 4]
> y[0:3]
   Error:	Argument or label mismatch for function call.
   Provided call:
   [] (Natural[]<constant> 'index' )

   Correct usage is:
   [] (Natural<any> index)

> y[ [0, 1, 2] ]
   Error:	Argument or label mismatch for function call.
   Provided call:
   [] (Natural[]<constant> 'index' )

   Correct usage is:
   [] (Natural<any> index)

Desired behavior:

> y = [0, 1, 2, 3, 4]
> y[0:3]
    [0, 1, 2]

> y[ [0, 1, 2] ]
    [0, 1, 2]

Dictionaries/maps

It'd be nice to be able to use dictionaries or maps as unordered containers. Ideally, keys and values could be of any type. For example:

x = Dictionary()
x["my_tree"] = readTrees("my_tree.tre")[1]
x["my_data"] = readDiscreteCharacterData("my_data.nex")

Dictionaries of containers (vectors or other dictionaries) could be useful, too.

The text was updated successfully, but these errors were encountered:

bredelings · 2023-08-01T17:57:06Z

Pairs and Tuple

Another thing that would be useful to have is tuples. Pairs are a special case: a 2-tuple.

Tuples are different than vectors because each element of a tuple can have a different type, whereas every element of a vector must have the same type.

In c++, we have the type std::pair<T1,T2> for pairs. It would be nice to have the same think in RevBayes.

If we have a type like Vector<Pair<Int,String>>, then this is one way to implement a dictionary. Although not the most efficient.

dict = [("alice",1), ("bob",2)]

Implicitly but strongly typed

Sebastian noted that this combination can be complicated. However, note that languages like Rust are implicitly but strongly typed. So there is a lot of prior art here.

bjoelle added the enhancement New feature or request label Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Ideas from improving containers #370

[Feature Request] Ideas from improving containers #370

mlandis commented Aug 1, 2023

bredelings commented Aug 1, 2023

[Feature Request] Ideas from improving containers #370

[Feature Request] Ideas from improving containers #370

Comments

mlandis commented Aug 1, 2023

Data tables

Slice-indexing

Dictionaries/maps

bredelings commented Aug 1, 2023

Pairs and Tuple

Implicitly but strongly typed