Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RobustICA behaviour on non-convergence #1047

Open
Lestropie opened this issue Feb 26, 2024 · 3 comments
Open

RobustICA behaviour on non-convergence #1047

Lestropie opened this issue Feb 26, 2024 · 3 comments

Comments

@Lestropie
Copy link

I want to discuss a confound intrinsic to #1013, which, given its esoteric nature, I felt better to write as a separate Issue so as to not cross-contaminate discussions.

For any given dataset, it is possible for the RobustICA run to be "inadequate".
There are two levels to such:

  1. The Index Quality (IQ) is below some threshold;
  2. The clustering does not converge, and an exception is thrown.

Then, depending on internal logic, there are multiple potential reactions:

  1. Do nothing, but log a warning
    (obviously only applicable to the low IQ case, not the exception case)
  2. Re-try RobustICA, but changing some parameter; for instance:
    1. Use a different clustering method
    2. Use an increased number of runs
    3. Potentially others
  3. Error out, requiring a user to explicitly try different settings for their data.

In its current state at tip 979d026, the code in #1013 is doing the following:

  • If exception is thrown, try AgglomerativeClustering rather than DBSCAN;
  • If IQ is below 0.6, issue a warning.

I want to approach this question with a fresh set of eyes, as the current logic may not be the optimal choice for pushing code out to the public.
In part, there is a question about the evaluation that led to addition of attempting an alternative clustering algorithm if DBSCAN fails. It's possible that AgglomerativeClustering was only determined to be necessary for testing purposes on aggressively downsampled data, and that for public consumption it would be preferable to omit that extra logic. Hopefully @BahmanTahayori can provide further insight here.
The discussion I want to initiate here is however not specific to that one point.


  1. If, for a fixed seed + clustering method + number of runs, the clustering is "inadequate", what should the TEDANA software do?

    1. Should this behaviour be different depending on whether an exception was thrown vs. IQ being too low?
  2. Should the RobustICA clustering method be exposed at the TEDANA command-line?

    1. Note that if TEDANA were to include any internal logic (such as it does currently) where it makes multiple attempts using different clustering approaches, then that logic would itself need to be both described in, and accessible from, the command line.
  3. Upon clustering being deemed "inadequate", would it be better to, instead of changing the clustering algorithm, increase the number of runs?

    1. This would require evidence that increasing the number of runs does in fact improve the value of IQ; hopefully @BahmanTahayori can contribute here.
    2. If so, the "number" of robust runs controlled at the command-line would instead be the minimum number of runs.
      It is also likely that you would want to specify a maximum number of runs to prevent the software from running indefinitely.
    3. Ideally, rather than re-running RobustICA from scratch with the same seed but a larger number of runs, the code would instead append the additional runs to the existing dataset, clustering would be re-run on such, and the results would remain deterministic.
      Ie. The result obtained using a fixed seed of 42 and 40 runs should be identical to the result obtained using a fixed seed of 42 when 30 runs were first generated, clustering failed, and an additional 10 runs were appended and clustering was re-run.
      Whether or not this is possible will depend on the exposed interface of RobustICA; hopefully @BahmanTahayori can look into this and feed back.
@handwerkerd
Copy link
Member

In testing out the method, my suspicion is that, if it doesn't converge with one starting seed, it often won't converge with another. The parameter that likely will make a difference is the initial number of components requested at initialization. I'm still trying to get a better handle on the optimal number of components to request, but most of the failures I've seen are when I request too many and the little variance spread out across many components is just inconsisent.

For the datasets you're testing, I wonder if you see a higher IQ score with fewer initial components requested.

For your Qs:

  1. I'd lean towards a very big warning rather than an exception. The method works with a single run of fastICA, but it's sensitive to seed selection. If data are so sensitive to seed selection that the robust method doesn't work, that's a warning sign, but I'd rather users still got out usable result. The other option would be to re-run with fewer initial components.

  2. As for what to do, I don't like that it's switching clustering methods under-the-hood without having a good understanding for why one method works and the other doesn't.

  3. I'd love for a way to add runs without running RobustICA from scratch, but that looks like it might require editing RobustICA The other parameter that's in RobustICA that's not outputted is how many times individual runs of fastICA fails to converge. When I saw things going wrong, what typically happened was fastICA successfully converged less than 1/3 of the time, but those non-convergences were just outputs to the screen in parallelized code and there was no clearly way to get a count of how many times that happened.

@Lestropie
Copy link
Author

RE point 3, it's possible that the kinds of augmentations that would be ideal from a TEDANA standpoint would both be applicable in other ICA contexts, and be more appropriately / cleanly implemented within RobustICA. So it might be the case that the result of this thread is to bounce feature requests to RobustICA, in terms of eg.:

  • Having RobustICA yield data describing the FastICA failures;
  • Being able to request a number of total FastICA runs vs. number of converged FastICA runs;
  • Ability to control RobustICA completion using alternative criteria.

I'm not an ICA person so can't really comment on precedents in these regards, but I might be able to give advice in terms of software implementation.

@BahmanTahayori
Copy link

Thanks @Lestropie and @handwerkerd for your suggestions and comments. I am working on the PR and will incorporate your suggestion as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants