Make unused replications warning more informative #2128

martsc1 · 2022-11-07T16:25:43Z

Enhancement.
Tested once without creating errors. Imo this adds useful additional information to the warning.

robertoostenveld · 2022-11-18T13:52:56Z

thanks for the suggestion.

Please use ft_warning with sprintf-like argument, e.g., like this

  ft_warning('Using only %d out of %d replications for the computation of the statistic', repsused, repstotal);

When I run the following

>> cfg = [];
>> cfg.ivar = 1
>> ft_statfun_indepsamplesT(cfg, randn(1,10), [1 1 1 2 2 2 2 3 3 3])

I get Warning: Using only 0 out of 1 replications for the computation of the statistic! whereas I had expected that it would tell me that I used 7 out of 10. So the counts don't appear to be correct.

robertoostenveld · 2022-11-18T13:55:10Z

I believe it should be

 ft_warning('Using only %d out of %d replications for the computation of the statistic', nrepl, size(design,2));

schoffelen · 2022-12-12T08:23:10Z

Hi Martin @martsc1 , could you make the change suggested by Robert, and test it? If it works as intended we can merge this PR!

martsc1 · 2022-12-12T10:34:44Z

I get the following error when I try to use the suggested changes with a data matrix instead of a vector.
I'm not sure however, if that is a bug or I misunderstood how the function is supposed to work.

cfg = [];
cfg.ivar = 1;
ft_statfun_indepsamplesT(cfg, rand(10,10), [1 1 1 2 2 2 2 3 3 3 ])

Error using warning
Formatted arguments must be scalar.

Error in ft_notification (line 354)
warning(msgId, varargin{:});

Error in ft_warning (line 63)
ft_notification(varargin{:});

Perhaps this could be a fix:
ft_warning('Using only %d out of %d replications for the computation of the statistic', nrepl(1), size(design,2));

This doesn't give me an error for either a data vector or matrix, and gives the expected result.

schoffelen · 2022-12-12T12:41:54Z

I see, good point. The statement is only strictly valid once all(nrepl==nrepl(1)). For now it is safe to assume this I think

martsc1 · 2022-12-12T12:57:53Z

I think it is actually not safe to assume that. If the data contains NaN values for instance.

I'm a bit confused now about what is being checked at this point in the function.
Is it the number of used units of observations (participants for example), or the number of valid data points?
For example, is it checking whether for each point in the design matrix there is some valid data, or out of all the data points, how many are valid.

So in the above example with the random 10x10 matrix, 7 out 10 units of observations are used, but 70 out of 70 data points are used. Which of the two should be checked here?

schoffelen · 2022-12-12T13:45:22Z

the data matrix in the ft_statfun_ is always datapoint vs. observation. The datapoint dimension is a combination of spatial, spectral (frequencies), and time points. In some situations, e.g. a TFR with a frequency-varying time-window, it could be that there are some observations that contain a nan (i.e. corresponding to a time point at which no estimate could be made due to not enough data in the time window). For other frequency bins this data could be available. This would only be an issue I think, once the statistics are performed within a single subjects, across trials. If it's a 'second level' test, when the observation dimension reflects subjects, then I think it is (at this moment) safe to assume: once-a-nan, always-a-nan.

we could consider to replace the nrepl(1) in the warning to mean(nrepl) and then replacing 'Using only', with Using on average'

martsc1 · 2022-12-12T14:38:34Z

I still don't understand whether a replication refers to a datapoint or an observatoin?

schoffelen · 2022-12-12T14:58:21Z

replication = observation, can be trial or subject, and is defined in the columns

martsc1 · 2022-12-12T15:37:58Z

Thanks for the clarification!
I have a dataset with powerspectrum data and a lot of NaN's due to the situation you described above. For each replication I have either 0 or the full number of observations, so in that case both nrepl(1) or mean(nrepl) would give misleading information. I don't know how common my situation is though, and whether that is a problem with my data rather than with the function.

martsc1 · 2022-12-14T10:12:00Z

How about this solution:

if any(nrepl<size(design,2))
    if ismember(0,nrepl)
        datapused=sum(nrepl==size(design,2));
        ft_warning('Using only %d out of %d datapoints for the computation of the statistic!', datapused, numel(nrepl));
        if any(nrepl(nrepl<size(design,2))>0)
            ft_warning('Using between %d and %d out of %d replications for the computation of the statistic',min(nrepl(nrepl>0)), max(nrepl), size(design,2));
        end
    else
        ft_warning('Using between %d and %d out of %d replications for the computation of the statistic', min(nrepl), max(nrepl), size(design,2));
    end
end

Now the function would warn about both unused replications and unused datapoints.
Alternatively, if this feels to clunky, a more minimalistic solution would be to not make any changes to the code but only change the warning text to 'Not all replications or datapoints are used for the computation of the statistic.'

schoffelen · 2022-12-14T10:20:32Z

Hi @martsc1 thanks for your perseverance, I don't mind being informative, but the risk exists that people get annoyed by the amount of warnings on their screen -> ft_statfun_ will be called many times when running permutation statistics, and that will lead to many warnings, unless the user's settings are tweaked a bit to suppress repeating messages. I don't know from the top of my head what the default is. But let's not worry about that here.

May I suggest a slight change of logic? You propose a nested series of if statements, which I think can be simplified to 2 non-nested if statements:

if any(nrepl==0)
% here I would change the message a bit: "computing the test statistic for %d out of %d datapoints"
end

and

if any(nrepl<size(design,2)
end

martsc1 · 2022-12-15T08:37:44Z

I'm happy with any suggestions. I think we need to modify it slightly to prevent getting replication warnings when not all datapoints are used, but all replications are. Maybe this could work:

if any(nrepl==0)
        datapused=sum(nrepl==size(design,2));
        ft_warning('Computing the test statistic for %d out of %d datapoints', datapused, numel(nrepl));
end

if any(nrepl(nrepl>0)<size(design,2))
       ft_warning('Using between %d and %d out of %d replications for the computation of the statistic',min(nrepl(nrepl>0)), max(nrepl), size(design,2));
end

Updated changes based on commit discussion.

whitespace

schoffelen · 2023-02-01T11:07:20Z

I think this is almost ready to be merged, were it not for the fact that for overall code consistency we might want to consider to also add similar heuristics to the other (in)depsamples statfun (F and T)

schoffelen · 2023-04-26T06:58:05Z

Hi Martin, @martsc1 would you be up for making similar suggestions for the other 'statfuns', so that we can push this forward, and don't lose the work that you have spent on this up until now?

martsc1 added 2 commits November 7, 2022 17:16

Make unused replications warning more informative

8b3a0f2

Update ft_statfun_indepsamplesT.m

1392121

martsc1 and others added 2 commits January 24, 2023 09:59

Update ft_statfun_indepsamplesT.m

a5412b5

Updated changes based on commit discussion.

Update ft_statfun_indepsamplesT.m

53936a5

whitespace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make unused replications warning more informative #2128

Make unused replications warning more informative #2128

martsc1 commented Nov 7, 2022

robertoostenveld commented Nov 18, 2022

robertoostenveld commented Nov 18, 2022

schoffelen commented Dec 12, 2022 •

edited

martsc1 commented Dec 12, 2022 •

edited

schoffelen commented Dec 12, 2022

martsc1 commented Dec 12, 2022

schoffelen commented Dec 12, 2022

martsc1 commented Dec 12, 2022

schoffelen commented Dec 12, 2022

martsc1 commented Dec 12, 2022

martsc1 commented Dec 14, 2022 •

edited

schoffelen commented Dec 14, 2022

martsc1 commented Dec 15, 2022

schoffelen commented Feb 1, 2023

schoffelen commented Apr 26, 2023

Make unused replications warning more informative #2128

Are you sure you want to change the base?

Make unused replications warning more informative #2128

Conversation

martsc1 commented Nov 7, 2022

robertoostenveld commented Nov 18, 2022

robertoostenveld commented Nov 18, 2022

schoffelen commented Dec 12, 2022 • edited

martsc1 commented Dec 12, 2022 • edited

schoffelen commented Dec 12, 2022

martsc1 commented Dec 12, 2022

schoffelen commented Dec 12, 2022

martsc1 commented Dec 12, 2022

schoffelen commented Dec 12, 2022

martsc1 commented Dec 12, 2022

martsc1 commented Dec 14, 2022 • edited

schoffelen commented Dec 14, 2022

martsc1 commented Dec 15, 2022

schoffelen commented Feb 1, 2023

schoffelen commented Apr 26, 2023

schoffelen commented Dec 12, 2022 •

edited

martsc1 commented Dec 12, 2022 •

edited

martsc1 commented Dec 14, 2022 •

edited