pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182

seisman · 2024-04-19T15:16:08Z

Description of proposed changes

This PR refactors the pygmt.x2sys_cross function to use virtualfiles for output. Need to note that x2sys_cross still uses temporary files in the tempfile_from_dftrack function.

Partially address #3160.

This PR introduces a breaking change: Previously, the dummy times in 3-4 columns (with column names i_1/i_2) were in np.object type, and now they have np.timedelta64 type.

…umpy/file formats

seisman · 2024-04-20T03:36:55Z

pygmt/src/x2sys_cross.py

+                # Convert 3rd and 4th columns to datetimes.
+                # These two columns have names "t_1"/"t_2" or "i_1"/"i_2".
+                # "t_1"/"t_2" means they are datetimes and should be converted.
+                # "i_1"/"i_2" means they are dummy times (i.e., floating-point values).


Am I understanding the output correctly?

I've never used x2sys, but here is my understanding of the C codes and the output:

The 3rd and 4th columns are datetimes. They can be either absolute datetimes (e.g., 2023-01-01T01:23:45.678 or dummy datetimes (i.e., double-precision numbers), depending on whether the input tracks contain datetimes.

Internally, absolute datetimes are also represented as double-precision numbers in GMT. So absolute datetimes and dummy datetimes are the same internally.

When outputting to a file, GMT will convert double-precision numbers into absolute datetimes, since GMT know if the column has dummy datetimes or not.

A GMT_DATASET container can only contain double-precision numbers and text strings. So when outputting to a virtual file, the 3rd and 4th columns always have double-precision numbers. If the column names are t_1/t_2, then we know they're absolute datetimes and should be converted; otherwise, they are just dummy datetimes and should not be converted.

I'm a little unsure if i_1/i_2 are actually dummy datetimes. This is a sample output from x2sys_cross:

# Tag: X2SYS4ivlhlo4 # Command: x2sys_cross @tut_ship.xyz -Qi -TX2SYS4ivlhlo4 ->/tmp/lala.txt # x y i_1 i_2 dist_1 dist_2 head_1 head_2 vel_1 vel_2 z_X z_M > @tut_ship 0 @tut_ship 0 NaN/NaN/1357.17 NaN/NaN/1357.17 251.004840022 20.000079064 18053.5647431 13446.6562433 333.339586673 229.636557499 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183 251.004840022 20.000079064 18053.5647431 71783.6562433 333.339586673 1148.20975878 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183 250.534946327 20.0000526811 18053.3762934 66989.0210846 332.869692978 1022.68273972 269.996783034 269.360150109 NaN NaN -57.6485957585 -2686.4268008 250.532033147 20.0000525175 18053.3751251 66988.9936489 332.866779797 1022.67977813 269.996783034 22.0133296951 NaN NaN -64.5973890802 -2682.04812157 252.068705 20.000075 13447.5 71784.5 230.700422496 1149.27362378 269.995072235 269.995072235 NaN NaN 0 -3206.5

It seems like the i_1/i_2 values vary between rows, but I can't quite remember what they represent... maybe an index of some sort? I might need to inspect the C code to see what's going on, can you point me to where these i_1/i_2 columns are being output?

Dummy times are just double-precision indexes from 0 to n (xref: https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys.c#L533).

The column name i_1 or t_1 is controlled by the variable t_or_i in the C code (https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys_cross.c#L998). From https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys_cross.c#L568, it's clear that, if got_time is True, then the column is absolute time (GMT_IS_ABSTIME), otherwise it's double-precision numbers (GMT_IS_FLOAT).

We can keep the dummy times as double-precision numbers or think them as seconds since unix epoch and then convert them to absolute times.

We can keep the dummy times as double-precision numbers or think them as seconds since unix epoch and then convert them to absolute times.

Maybe convert the relative time to pandas.Timedelta or numpy.timedelta64? Xref #2848.

Sounds good. Done in 9d12ae1.

weiji14

There are 2 main changes happening in this PR:

Adding the output_type="numpy" option
Handling the different dtypes of the i_1/i_2 or t_1/t_2 columns

We can keep this as a single PR since it's hard to separate the two things, but might need to discuss the implementation a bit more.

weiji14 · 2024-04-21T21:56:14Z

pygmt/src/x2sys_cross.py

-def x2sys_cross(tracks=None, outfile=None, **kwargs):
+def x2sys_cross(
+    tracks=None,
+    output_type: Literal["pandas", "numpy", "file"] = "pandas",


Honestly, I'm not sure if we should support numpy output type for x2sys_cross because all 'columns' will need to be the same dtype in a np.ndarray. If there are datetime values in the columns, they will get converted to floating point (?), which makes it more difficult to use later. Try adding a unit test for numpy output_type and see if it makes sense.

If there are datetime values in the columns, they will get converted to floating point (?)

You're right. Datetimes are converted to floating points by df.to_numpy(). Will remove the numpy output type.

weiji14 · 2024-04-21T22:26:35Z

pygmt/src/x2sys_cross.py

+                # Convert 3rd and 4th columns to datetimes.
+                # These two columns have names "t_1"/"t_2" or "i_1"/"i_2".
+                # "t_1"/"t_2" means they are datetimes and should be converted.
+                # "i_1"/"i_2" means they are dummy times (i.e., floating-point values).


I'm a little unsure if i_1/i_2 are actually dummy datetimes. This is a sample output from x2sys_cross:

# Tag: X2SYS4ivlhlo4 # Command: x2sys_cross @tut_ship.xyz -Qi -TX2SYS4ivlhlo4 ->/tmp/lala.txt # x y i_1 i_2 dist_1 dist_2 head_1 head_2 vel_1 vel_2 z_X z_M > @tut_ship 0 @tut_ship 0 NaN/NaN/1357.17 NaN/NaN/1357.17 251.004840022 20.000079064 18053.5647431 13446.6562433 333.339586673 229.636557499 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183 251.004840022 20.000079064 18053.5647431 71783.6562433 333.339586673 1148.20975878 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183 250.534946327 20.0000526811 18053.3762934 66989.0210846 332.869692978 1022.68273972 269.996783034 269.360150109 NaN NaN -57.6485957585 -2686.4268008 250.532033147 20.0000525175 18053.3751251 66988.9936489 332.866779797 1022.67977813 269.996783034 22.0133296951 NaN NaN -64.5973890802 -2682.04812157 252.068705 20.000075 13447.5 71784.5 230.700422496 1149.27362378 269.995072235 269.995072235 NaN NaN 0 -3206.5

It seems like the i_1/i_2 values vary between rows, but I can't quite remember what they represent... maybe an index of some sort? I might need to inspect the C code to see what's going on, can you point me to where these i_1/i_2 columns are being output?

pygmt/src/x2sys_cross.py

weiji14 · 2024-06-05T21:03:33Z

I'll give this a proper review over the weekend, a bit busy this week with some deadlines 🫠

weiji14

Cool, thanks also for handling the output differences between macOS and Linux (xref #3194). Pre-approving as the main logic around timedelta conversion checks out ok. Suggestions below are mostly documentation related or minor.

pygmt/src/x2sys_cross.py

pygmt/tests/test_x2sys_cross.py

pygmt/src/x2sys_cross.py

pygmt/tests/test_x2sys_cross.py

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

seisman force-pushed the refactor/x2sys_cross branch from 295afc0 to 5280524 Compare April 19, 2024 15:51

seisman added enhancement Improving an existing feature needs review This PR has higher priority and needs review. labels Apr 19, 2024

seisman added this to the 0.12.0 milestone Apr 19, 2024

seisman marked this pull request as ready for review April 19, 2024 15:51

seisman requested a review from weiji14 April 19, 2024 15:51

seisman force-pushed the refactor/x2sys_cross branch 3 times, most recently from bc341f6 to ff290da Compare April 20, 2024 02:35

seisman added 4 commits April 20, 2024 10:50

pygmt.x2sys_cross: Add 'output_type' parameter for output in pandas/n…

95fab98

…umpy/file formats

Move session-unrelated code block outside the session block

ce926a0

Refactor if-else using match statements

86278cb

Fix static typing issue

58c6ea4

seisman force-pushed the refactor/x2sys_cross branch from ff290da to 58c6ea4 Compare April 20, 2024 02:55

seisman marked this pull request as draft April 20, 2024 03:03

seisman removed the needs review This PR has higher priority and needs review. label Apr 20, 2024

Fix warnings

d6eeade

seisman marked this pull request as ready for review April 20, 2024 03:36

seisman commented Apr 20, 2024

View reviewed changes

weiji14 reviewed Apr 21, 2024

View reviewed changes

Convert dummpy times to timedelta

9d12ae1

seisman added the needs review This PR has higher priority and needs review. label Apr 22, 2024

seisman mentioned this pull request Apr 22, 2024

Allow validate_output_table_type to specify the supported output types #3191

Merged

seisman added 3 commits April 22, 2024 14:02

Let validate_output_table_type specify the supported output types

28eb1df

Fix

5e926e8

Merge branch 'main' into validator/valid-types

c1c756d

seisman removed this from the 0.12.0 milestone Apr 29, 2024

seisman added 3 commits May 6, 2024 09:48

Update docstrings

3a3df0a

Merge branch 'main' into validator/valid-types

3aea9a6

Merge branch 'main' into validator/valid-types

d869a32

seisman marked this pull request as ready for review May 28, 2024 05:38

seisman added this to the 0.13.0 milestone May 28, 2024

seisman commented May 28, 2024

View reviewed changes

pygmt/src/x2sys_cross.py Outdated Show resolved Hide resolved

Update pygmt/src/x2sys_cross.py

870d9c7

seisman force-pushed the refactor/x2sys_cross branch 6 times, most recently from 13d36e4 to 5c7214d Compare May 28, 2024 12:37

Fix x2sys_cross tests on macOS M

db94b91

seisman force-pushed the refactor/x2sys_cross branch from 5c7214d to db94b91 Compare May 28, 2024 12:41

seisman added the needs review This PR has higher priority and needs review. label May 28, 2024

seisman commented May 28, 2024

View reviewed changes

pygmt/src/x2sys_cross.py Outdated Show resolved Hide resolved

Update pygmt/src/x2sys_cross.py

de17d5e

seisman commented May 28, 2024

View reviewed changes

pygmt/src/x2sys_cross.py Outdated Show resolved Hide resolved

Update pygmt/src/x2sys_cross.py

ebce56e

seisman changed the title ~~pygmt.x2sys_cross: Refactor to use virtualfiles for output tables~~ pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type May 28, 2024

seisman added 2 commits May 29, 2024 13:07

Merge branch 'main' into refactor/x2sys_cross

71af717

Merge branch 'main' into refactor/x2sys_cross

cf2cfc7

seisman requested a review from weiji14 June 3, 2024 14:39

Merge branch 'main' into refactor/x2sys_cross

3a62fc1

weiji14 approved these changes Jun 9, 2024

View reviewed changes

pygmt/src/x2sys_cross.py Outdated Show resolved Hide resolved

pygmt/tests/test_x2sys_cross.py Outdated Show resolved Hide resolved

pygmt/src/x2sys_cross.py Outdated Show resolved Hide resolved

pygmt/tests/test_x2sys_cross.py Outdated Show resolved Hide resolved

seisman and others added 2 commits June 9, 2024 15:48

Apply suggestions from code review

9fd35ce

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update pygmt/tests/test_x2sys_cross.py

2b3474b

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

seisman merged commit 844594f into main Jun 9, 2024
18 of 20 checks passed

seisman deleted the refactor/x2sys_cross branch June 9, 2024 14:03

seisman removed the needs review This PR has higher priority and needs review. label Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182

pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182

seisman commented Apr 19, 2024 •

edited

seisman Apr 20, 2024

seisman Apr 20, 2024

weiji14 Apr 21, 2024

seisman Apr 21, 2024

weiji14 Apr 22, 2024 •

edited

seisman Apr 22, 2024

weiji14 left a comment

weiji14 Apr 21, 2024 •

edited

seisman Apr 22, 2024

weiji14 Apr 21, 2024

weiji14 commented Jun 5, 2024

weiji14 left a comment

pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182

pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182

Conversation

seisman commented Apr 19, 2024 • edited

seisman Apr 20, 2024

Choose a reason for hiding this comment

seisman Apr 20, 2024

Choose a reason for hiding this comment

weiji14 Apr 21, 2024

Choose a reason for hiding this comment

seisman Apr 21, 2024

Choose a reason for hiding this comment

weiji14 Apr 22, 2024 • edited

Choose a reason for hiding this comment

seisman Apr 22, 2024

Choose a reason for hiding this comment

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 Apr 21, 2024 • edited

Choose a reason for hiding this comment

seisman Apr 22, 2024

Choose a reason for hiding this comment

weiji14 Apr 21, 2024

Choose a reason for hiding this comment

weiji14 commented Jun 5, 2024

weiji14 left a comment

Choose a reason for hiding this comment

seisman commented Apr 19, 2024 •

edited

weiji14 Apr 22, 2024 •

edited

weiji14 Apr 21, 2024 •

edited