Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't display attributes expanded for dataset #1157

Open
jeromekelleher opened this issue Dec 14, 2023 · 5 comments
Open

Don't display attributes expanded for dataset #1157

jeromekelleher opened this issue Dec 14, 2023 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@jeromekelleher
Copy link
Collaborator

When you look at a dataset derived from VCF in a notebook, you get this:

Screenshot from 2023-12-14 13-00-11
Screenshot from 2023-12-14 12-59-49

The attributes are automatically "open",and this means that the VCF header attibute (which will be several megabytes for large datasets) dominates.

I'm not sure this is something we can influence, but can we either truncate the vcf header attribute for display, or tweak the display of the dataset somehow to at least keep the attributes "closed" by default?

@jeromekelleher jeromekelleher added the enhancement New feature or request label Dec 14, 2023
@jeromekelleher jeromekelleher added this to the 0.8.0 release milestone Dec 14, 2023
@jeromekelleher
Copy link
Collaborator Author

Alternatively we could discard the "#CHROM ..." line of the VCF header, since we can reproduce it using the sample_id variable. Also, it's wrong when we do a subset operation.

@timothymillar
Copy link
Collaborator

It can be controlled with an xarray setting: https://docs.xarray.dev/en/stable/generated/xarray.set_options.html#xarray-set-options

@tomwhite
Copy link
Collaborator

This originally came up here: https://github.com/pystatgen/sgkit/issues/463#issuecomment-827445369

@jeromekelleher
Copy link
Collaborator Author

As a quick aside @tomwhite, do we ever use the "#CHROM POS.." line from the vcf header? If not I think we should discard it, as there's no real information there (i'll open an issue)

@tomwhite
Copy link
Collaborator

@jeromekelleher we used to use the "#CHROM POS.." line to support round-tripping of VCF -> Zarr -> VCF, but we can generate the header now, so it may not be necessary to store it. See https://github.com/pystatgen/sgkit/blob/2ab47b587768bed166d3c477694bed06250123c9/sgkit/io/vcf/vcf_writer.py#L412-L559

@jeromekelleher jeromekelleher modified the milestones: 0.8.0 release, 0.8.1 Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants