Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datalad datasets : unable to read volumes #65

Open
aryamehta2006 opened this issue Jul 11, 2022 · 22 comments
Open

datalad datasets : unable to read volumes #65

aryamehta2006 opened this issue Jul 11, 2022 · 22 comments

Comments

@aryamehta2006
Copy link

VisualQC faces some issue with reading MR volumes from BIDS format downloaded via datalad -- see log below. cc @yarikoptic

(base) aryamehta@Aryas-Air ~ % visualqc_anatomical -b /Users/aryamehta/datasets/ds002785 -old

Anatomical MRI module
Time stamp : 2022-07-11 16:41:35

version info: visualqc 0.6.1
numpy 1.21.5 / scipy 1.7.3 / matplotlib 3.5.1
python 3.9.12 (main, Jun  1 2022, 06:34:44) 
[Clang 12.0.0 ]
platform macOS-12.4-arm64-arm-64bit
Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101


Input folder: /Users/aryamehta/datasets/ds002785
Output folder: /Users/aryamehta/datasets/ds002785/visualqc
outlier detection: disabled, as requested.
Restoring ratings from previous session(s), if they exist ..
To be reviewed : 216


Reviewing MD5E-s6747706--db99fa634eb92335db8a483331f7806a.nii.gz
Traceback (most recent call last):
  File "/Users/aryamehta/opt/anaconda3/bin/visualqc_anatomical", line 8, in <module>
    sys.exit(main())
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/__t1_mri__.py", line 12, in main
    t1_mri.cli_run()
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/t1_mri.py", line 872, in cli_run
    wf.run()
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/workflows.py", line 87, in run
    self.loop_through_units()
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/workflows.py", line 224, in loop_through_units
    skip_subject = self.load_unit(unit_id)
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/t1_mri.py", line 507, in load_unit
    self.current_img_raw = read_image(t1_mri_path, error_msg='T1 mri')
  File "/Users/aryamehta/opt/anaconda3/lib/python3.9/site-packages/visualqc/utils.py", line 37, in read_image
    raise IOError('Given path to {} does not exist!\n\t{}'
OSError: Given path to T1 mri does not exist!
	/Users/aryamehta/datasets/ds002785/.git/annex/objects/x8/Z5/MD5E-s6747706--db99fa634eb92335db8a483331f7806a.nii.gz/MD5E-s6747706--db99fa634eb92335db8a483331f7806a.nii.gz
(base) aryamehta@Aryas-Air ~ % 
@yarikoptic
Copy link

yarikoptic commented Jul 11, 2022

did you datalad get the content of that dataset before running visualqc_anatomical?

@raamana
Copy link
Owner

raamana commented Jul 11, 2022

That is how it was originally downloaded but we copy pasted it to another computer (outside dataalad), that’s probably the source of the error

but we can see the MRI scan though so they should be MRI scan data inside there, no?

@raamana
Copy link
Owner

raamana commented Jul 11, 2022

I get the same error in the computer where I did the datalad get btw:

(base) $ 19:04:49 Quark ds002785 >>  vqct1 -b $PWD -old

Anatomical MRI module
Time stamp : 2022-07-11 19:04:54

version info: visualqc 0.6.1
numpy 1.17.4 / scipy 1.1.0 / matplotlib 3.5.1
python 3.7.2 (default, Dec 29 2018, 00:00:04)
[Clang 4.0.1 (tags/RELEASE_401/final)]
platform Darwin-21.4.0-x86_64-i386-64bit
Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64


/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/bids/layout/models.py:152: FutureWarning: The 'extension' entity currently excludes the leading dot ('.'). As of version 0.14.0, it will include the leading dot. To suppress this warning and include the leading dot, use `bids.config.set_option('extension_initial_dot', True)`.
  FutureWarning)
Input folder: /Volumes/work/Pitt/datasets/ds002785
Output folder: /Volumes/work/Pitt/datasets/ds002785/visualqc
outlier detection: disabled, as requested.
Restoring ratings from previous session(s), if they exist ..
To be reviewed : 216


Reviewing MD5E-s6406026--f20d90f38f7122ca08d290b502661802.nii.gz
Traceback (most recent call last):
  File "/Users/Reddy/anaconda3/envs/py36/bin/vqct1", line 8, in <module>
    sys.exit(main())
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/__t1_mri__.py", line 12, in main
    t1_mri.cli_run()
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/t1_mri.py", line 872, in cli_run
    wf.run()
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/workflows.py", line 87, in run
    self.loop_through_units()
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/workflows.py", line 224, in loop_through_units
    skip_subject = self.load_unit(unit_id)
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/t1_mri.py", line 507, in load_unit
    self.current_img_raw = read_image(t1_mri_path, error_msg='T1 mri')
  File "/Users/Reddy/anaconda3/envs/py36/lib/python3.7/site-packages/visualqc/utils.py", line 38, in read_image
    ''.format(error_msg, img_spec))
OSError: Given path to T1 mri does not exist!
	/Volumes/work/Pitt/datasets/ds002785/.git/annex/objects/jz/W2/MD5E-s6406026--f20d90f38f7122ca08d290b502661802.nii.gz/MD5E-s6406026--f20d90f38f7122ca08d290b502661802.nii.gz
(base) $ 19:08:33 Quark ds002785 >>

what commands can I run to ensure it was gotten / installed properly? I tried metadata but it didn't work:

(base) $ 19:11:34 Quark ds002785 >>  datalad metadata -d $PWD
[WARNING] Found no aggregated metadata info file /Volumes/work/Pitt/datasets/ds002785/.datalad/metadata/aggregate_v1.json. You will likely need to either update the dataset from its original location or reaggregate metadata locally.
[WARNING] Dataset at . contains no aggregated metadata on this path [metadata(/Volumes/work/Pitt/datasets/ds002785)]
(base) $ 19:11:39 Quark ds002785 >>

@raamana
Copy link
Owner

raamana commented Jul 11, 2022

now that I think about it, I realize I only installed one of the derivatives : freesurfer, and not the base BIDS dataset. I am now running datalad get sub-????/anat/* and see whether the error reappears after the download is finished! My bad :)

@raamana
Copy link
Owner

raamana commented Jul 11, 2022

I get the following, and it worked this time:

(base) $ 19:14:57 Quark ds002785 >>  datalad get sub-????/anat/*
get(ok): sub-0001/anat/sub-0001_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0067/anat/sub-0067_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0064/anat/sub-0064_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0135/anat/sub-0135_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0152/anat/sub-0152_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0193/anat/sub-0193_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0033/anat/sub-0033_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0213/anat/sub-0213_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0091/anat/sub-0091_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-0149/anat/sub-0149_T1w.nii.gz (file) [from s3-PUBLIC...]
  [206 similar messages have been suppressed]
action summary:
  get (notneeded: 216, ok: 216)

i don't understand the notneeded part in get (notneeded: 216 ... message though?

although I see two issues:

  • its very slow to traverse the dataset for some reason! All visualqc does is to look for valid MRIs, which should be super fast, so not sure who loop it's getting into while trying to index the */anat/*_T1w.nii.gz.nii files
  • due to fully resolving the file path, real subject IDs (like sub-0001) are being replaced with an MD5 hash like MD5E-s6843657--a5192feb724d3a07d8f20724bfce9f47.nii.gz. This is an issue as we need to record the QC ratings against the subject IDs, so I'll have to figure out a better way to index BIDS datasets obtained via datalad. My current implementation works with plains BIDS datasets, without symlinks managed by datalad

@yarikoptic
Copy link

Let's zoom tomorrow?

@raamana
Copy link
Owner

raamana commented Jul 12, 2022

Sure! Tomorrow is a bit tough with dental appointments and other things but Thursday afternoon works. Or Friday?

@yarikoptic
Copy link

sure, just let me know the time ;-) Thu we have ReproNim coworking time 11-5pm which happens in NMIND gather town, so can meet there

@raamana
Copy link
Owner

raamana commented Jul 12, 2022

we are trying to use the dataset on an M1 MacBook, and it appears installing datalad on it is not easy (and definitely not for a high school student)

i wish openfmri folks let us download the dataset, or parts of it, from a browser :). cc @effigies

i will check if AWS CLI works on M1 MacBook

@yarikoptic
Copy link

there is always trade off between "I want the flashiest latest cool gadget from a company which does not really care about science" and "I want a system for doing science" ;)

It is all on S3, you can use s3 clients to download straight from S3.

Re M1 -- should install rosetta and then git-annex should be installable from brew IIRC. some details here datalad/datalad#5701

@effigies
Copy link

i wish openfmri folks let us download the dataset, or parts of it, from a browser

OpenNeuro permits downloading; I believe recent Chrome or Firefox is needed for the download API needed to work with such large datasets. If you're still using legacy.openfmri.org, then I think there are tarballs, but these datasets are not kept in sync with OpenNeuro.

@raamana
Copy link
Owner

raamana commented Jul 12, 2022

damn, that's good to know. I was always seeing it from safari, and there was no indication at all that we could download it from a browser. I would suggest leaving a note to ask folks to use Chrome or Firefox, instead of silently removing that option on safari

@raamana
Copy link
Owner

raamana commented Jul 12, 2022

it doesn't seem to work on firefox btw, atleast for 2 datasets I looked at

@effigies
Copy link

You're right, it looks like Mozilla is not implementing this API; for some reason I thought they had. Looks like Chrome, Edge and Opera do implement it. https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API#browser_compatibility

@raamana
Copy link
Owner

raamana commented Jul 14, 2022

Hi Yarik, I am available in the next few hours if you want to look into this issue.

@yarikoptic
Copy link

pinged you on twitter with url to nmind if you don't know

@yarikoptic
Copy link

ok,since zooming didn't happen, let me follow up on original datalad-related issues from the last related to that comment by @raamana :

i don't understand the notneeded part in get (notneeded: 216 ... message though?

most likely those 216 were already obtained

its very slow to traverse the dataset for some reason! All visualqc does is to look for valid MRIs, which should be super fast, so not sure who loop it's getting into while trying to index the */anat/*_T1w.nii.gz.nii files

I have not looked inside: if visualqc traversal traverses also .git -- you might like to "disable" that. FWIW, here is out simple "walker" which exclude vcs subfolders by default: https://github.com/dandi/dandi-cli/blob/1c947365311732943753e15199a57c9bfd2759bf/dandi/utils.py#L260

regardless of the datalad, you might benefit from speeding up walking through multithreading the walk -- we have it in https://github.com/dandi/dandi-cli/blob/master/dandi/support/threaded_walk.py but there we have not added any vcs folders exclusion yet (used only within zarr folders) -- filed dandi/dandi-cli#1086 to possibly harmonize.

  • due to fully resolving the file path, real subject IDs (like sub-0001) are being replaced with an MD5 hash like MD5E-s6843657--a5192feb724d3a07d8f20724bfce9f47.nii.gz. This is an issue as we need to record the QC ratings against the subject IDs, so I'll have to figure out a better way to index BIDS datasets obtained via datalad. My current implementation works with plains BIDS datasets, without symlinks managed by datalad

such "resolve to the death" plagues many things, including browsers, AFNI etc. Often they come up with a switch to "do not bother resolving" and since I do not know details here I can only arrogantly state "there should be no need to resolve symlinks since that would incorporate some ad-hoc assumption on their purpose. If there is such ad-hoc assumption -- make it more explicit ". So what is the assumption which makes you to resolve the paths here? ;-)

@raamana
Copy link
Owner

raamana commented Aug 10, 2022

thanks Yarik for the detailed notes. I was thinking of potentially excluding certain paths like .git etc but I was afraid of making any ad-hoc changes file path management that might introduce funny behaviour across platforms

@raamana
Copy link
Owner

raamana commented Aug 10, 2022

I resolve paths by default as one of the several best practices for file/path management -- I don't understand the case against resolving though, except in extreme situations of large number of layers of sym-linking (which is often not the case with most regular users)

@yarikoptic
Copy link

I resolve paths by default as one of the several best practices for file/path management ...

could you provide reference for such a best practice. My mileage goes against it ;-)

@raamana
Copy link
Owner

raamana commented Aug 11, 2022

I guess we approach it with different experiences from the past i guess :). one obvious rationale is to avoid depending on relative paths, which caused some issues for me before, esp. when the same tool is used process different projects and datasets

@yarikoptic
Copy link

yarikoptic commented Aug 11, 2022

one obvious rationale is to avoid depending on relative paths,

"relative path" (e.g., sub-01/blah.nii.gz) -> "absolute path" (e.g., /home/pradeep/favoritebids/sub-01/blah.nii.gz) -> "resolved path" (e.g., /tmp/junk/scannedyesterday.dat) , so it seems you want "absolute paths" but talking about "resolved paths" while skipping "absolute" intermediate. Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants