Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit 23d5855 broke 3dRetinoPhase "delay" based estimation #556

Open
bogpetre opened this issue Dec 31, 2023 · 1 comment
Open

Commit 23d5855 broke 3dRetinoPhase "delay" based estimation #556

bogpetre opened this issue Dec 31, 2023 · 1 comment

Comments

@bogpetre
Copy link

bogpetre commented Dec 31, 2023

If running 3dRetinoPhase with Delay based estimation the program will segfault.

3dRetinoPhase is the afni program used for estimating retinotopic maps based on stereotyped stimuli (expanding circles and wedges sweeping around the fixation point). It appears to be one of the most convenient tools available for doing so. There are two methods for this estimation, once uses the fast fourier transform (FFT) to estimate the phase of voxel-wise responses relative to the wavelength with maximal power, the other uses the a hilbert transform to more precisely estimate the phase offset between a reference timeseries and the observed BOLD timeseries. The latter is the more modern and accurate approach. It is also the approach precluded by this segfault.

After the segfault, the error logs are uninformative, but for the record look like this:


*********------ CRASH LOG ------------------------------***********
Fatal Signal 11 (SIGSEGV) received
.......... recent internal history .........................................
++++++mri_killpurge [6]: {ENTRY (file=mri_purger.c line=259) from mri_free {232 ms}
      mri_killpurge -- check if im==NULL ptr=0x558f3d26f670
      mri_killpurge -- can't killpurge NULL fname! {232 ms}
------mri_killpurge [6]: EXIT} (file=mri_purger.c line=270) to mri_free {232 ms}
     mri_free -- free im {232 ms}
-----mri_free [5]: EXIT} (file=mri_free.c line=69) to THD_init_datablock_brick {232 ms}
+++++mri_free [5]: {ENTRY (file=mri_free.c line=49) from THD_init_datablock_brick {232 ms}
     mri_free -- call killpurge {232 ms}
++++++mri_killpurge [6]: {ENTRY (file=mri_purger.c line=259) from mri_free {232 ms}
      mri_killpurge -- check if im==NULL ptr=0x558f3d26f730
      mri_killpurge -- can't killpurge NULL fname! {232 ms}
------mri_killpurge [6]: EXIT} (file=mri_purger.c line=270) to mri_free {232 ms}
     mri_free -- free im {232 ms}
-----mri_free [5]: EXIT} (file=mri_free.c line=69) to THD_init_datablock_brick {232 ms}
+++++mri_free [5]: {ENTRY (file=mri_free.c line=49) from THD_init_datablock_brick {232 ms}
     mri_free -- call killpurge {232 ms}
++++++mri_killpurge [6]: {ENTRY (file=mri_purger.c line=259) from mri_free {232 ms}
      mri_killpurge -- check if im==NULL ptr=0x558f3d26f7f0
      mri_killpurge -- can't killpurge NULL fname! {232 ms}
------mri_killpurge [6]: EXIT} (file=mri_purger.c line=270) to mri_free {232 ms}
     mri_free -- free im {232 ms}
-----mri_free [5]: EXIT} (file=mri_free.c line=69) to THD_init_datablock_brick {232 ms}
    THD_init_datablock_brick -- making new dblk->brick {232 ms}
    THD_init_datablock_brick -- starting sub-brick creations {232 ms}
+++++mri_new_7D_generic [5]: {ENTRY (file=mri_new.c line=48) from THD_init_datablock_brick {232 ms}
-----mri_new_7D_generic [5]: EXIT} (file=mri_new.c line=143) to THD_init_datablock_brick {232 ms}
+++++mri_new_7D_generic [5]: {ENTRY (file=mri_new.c line=48) from THD_init_datablock_brick {232 ms}
-----mri_new_7D_generic [5]: EXIT} (file=mri_new.c line=143) to THD_init_datablock_brick {232 ms}
    THD_init_datablock_brick -- exiting {232 ms}
----THD_init_datablock_brick [4]: EXIT} (file=thd_initdblk.c line=1268) to EDIT_dset_items {232 ms}
---EDIT_dset_items [3]: EXIT} (file=edt_dsetitems.c line=949) to 3dRetinoPhase main {232 ms}
............................................................................
  3dRetinoPhase main
** AFNI compile date = Dec 30 2023
** [[Precompiled binary linux_centos_7_64: Dec 30 2023]]
** Program Crash **

The cause of this crash is the dereferencing of a NULL pointer, a bug introduced with an update from late 2022 (commit @23d5855).

To be more precise, 3dRetinoPhase initializes the Delay based method of estimation by invoking hilbertdelay_V2reset(); This is just a simple wrapper for hilbertdelay_V2() which it calls with opt=0, *del=NULL and *xcorCoef=Null arguments, among others. This combination of parameters results in the dereferencing of the del and xcorCoeff pointers on lines 1292 and 1293 of plug_delay_V2.h when an attempt is made to assign them values, but for null pointers this operation causes a segmentation fault. Prior to commit 23d5855 the opt=0 argument caused hilbertdelay_V2() to return before reaching this point, so this bug where a null pointer is dereferenced was never encountered.

hilbertdelay_V2reset() doesn't appear to be called anywhere else besides in 3dRetinoPhase. For at least this use case it may be possible to address the problem by allocating memory for these otherwise NULL variables and passing their pointers into the hilbertdelay_V2() invocation inside hilbertdelay_V2resest() like so,

static void hilbertdelay_V2reset() {
  float del=0.0, slp=0.0, xcor=0.0, xcorCoef=0.0,vx=0.0,vy=0.0;
   hilbertdelay_V2 (NULL, NULL, 0, 0, 0, 0, 0, 
                     0.0, 0, &del, &slp, &xcor, &xcorCoef, &vx, &vy);
}

instead of the current definition,

static void hilbertdelay_V2reset() {
   hilbertdelay_V2 (NULL, NULL, 0, 0, 0, 0, 0, 
                     0.0, 0, NULL, NULL, NULL, NULL, NULL, NULL);
}

Alternatively you can wrap the lines

   *del = NoWayDelay;              /* initialize to an unlikely value ...*/
   *xcorCoef = NoWayxcorCoef;          

with a conditional that checks if del and xcorCoef are null before running these lines.

Neither is a complete solution. They will allow the program to run and produce sensible output, but they won't be graceful. Various other "errors" will be printed to stdout (without terminating the program). Someone with a deeper understanding of afni than I should have a look and determine what the best solution might be.

I appreciate that these issues may take a while to be resolve. In the meantime, if other users need to use 3dRetinoPhase with delay based estimation, I suggest downgrading afni to one of the early 2022 versions (22.2.07 seems to be the last version without this issue) or alternatively modify the code yourselves and recompile it with a temp fix like the above.

@bogpetre
Copy link
Author

bogpetre commented Jan 2, 2024

Here's a potential fix (patch below). According to PT there are scenarios where opt=0 but it is still worthwhile to attempt some computations. The hilbertdelay_V2 function options aren't clearly documented, so I'm trying to deduce what the parameters are based on their usage, but it appears to me that the x and y variables are the timeseries and reference data (respectively) that are needed to make those attempted computations. In the case of hilbertdelay_V2reset() these values are NULL, so there's nothing to attempt. One potential solution to the NULL del and xcorCoef pointers then would be to determine if there's anything to compute after cleaning up all the other variables in the opt=0 block from lines 1245 - 1289, and if the x and y variables are NULL, returning at that point. This implicitly avoids dereferencing the NULL del and xcorCoef pointers since x and y should be NULL anytime del and xcorCoef are NULL.

It's not my favorite solution, because of the implicit nature of that check, but it's hard to imagine a scenario where it would desirable to continue evaluating hilberdelay_V2() without any timeseries data to work with, or conversely a scenario where the xcorCoef values are NULL but where valid x and y values are still provided. I've added an additional explicit check for the nullity of the del and xcorCoef pointers after the x and y variable values are checked just in case some future update results in this inconceivable scenario though. In that scenario an error is thrown. Have a look and see what you think.

--- afni_orig/src/plug_delay_V2.h	2024-01-02 10:32:45.951153000 -0500
+++ afni/src/plug_delay_V2.h	2024-01-02 10:52:50.116496000 -0500
@@ -1288,9 +1288,24 @@
       //return (0);
    }
 
-   
-   *del = NoWayDelay;              /* initialize to an unlikely value ...*/
-   *xcorCoef = NoWayxcorCoef;          
+   /* [BP: Jan 2, 2024] if x or y are NULL there's nothing more to do. This 
+      scenario occurs when when this function is called by 
+      hilbertdelay_V2reset. In this case del and xcorCoef pointers are also
+      NULL, which will cause a segfault when attempting to
+      dereference them in the next few lines of code, but because there's
+      nothing to evaluate, PT's logic re: 3dDelay above shouldn't apply.
+   */
+   if (x == NULL || y == NULL) {
+     return (0);
+   } 
+   else if (del == NULL || xcorCoef == NULL ) {
+     sprintf (buf,"Cannot evaluate with NULL delay or xcorr coef.\n");
+     error_message("hilbertdelay_V2",buf,0);
+     return (ERROR_OPTIONS);
+   } else {
+     *del = NoWayDelay;              /* initialize to an unlikely value ...*/
+     *xcorCoef = NoWayxcorCoef;          
+   }
 
 
    /*--------------------------------------------------------------------------*/

bogpetre added a commit to bogpetre/afni that referenced this issue May 1, 2024
3dRetinoPhase initializes the Delay based method of estimation by invoking hilbertdelay_V2reset(); This is just a simple wrapper for hilbertdelay_V2() which it calls with opt=0, *del=NULL and *xcorCoef=Null arguments, among others. This combination of parameters results in the dereferencing of the del and xcorCoeff pointers on lines 1292 and 1293 of plug_delay_V2.h when an attempt is made to assign them values, but for null pointers this operation causes a segmentation fault.

Instead we can determine if there's anything to compute after cleaning up all the other variables in the opt=0 block from lines 1245 - 1289, and if the x and y variables are NULL then we return at that point. This implicitly avoids dereferencing the NULL del and xcorCoef pointers since x and y should be NULL anytime del and xcorCoef are NULL.

This issue is discussed more fully in Issue afni#556
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant