Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FORMAT tags not in caps lock are not loaded in the genotype table #582

Open
SamuelNicaise opened this issue Apr 27, 2023 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@SamuelNicaise
Copy link
Collaborator

Genotype fields whose name is not entirely in caps are not loaded in the database (so all values for those fields are None).

Per VCF 4.3 specification, FORMAT tag names match the regular expression ^[A-Za-z ][0-9A-Za-z .]*$

To Reproduce

  1. Create a VCF with a FORMAT tag not entirely in caps lock
    ##FORMAT=<ID=VAF_min,Number=1,Type=Float,Description="VAF Variant Frequency minimum [Release=0.9.3;Date=20210721;AnnotationType=calculation]">

  2. Create a new project with that VCF

  3. Open the genotype module, see that all the genotype values for that field are None

image

  1. Open the database with a SQLite viewer, see that all the values in the genotype table for that field are None

image

Note:

  • This probably comes from this block in parse_variants() in the VcfReader class
      for gt_field in format_fields:
          try:
              value = sample[gt_field.upper()]
              if isinstance(value, list):
                  value = ",".join(str(i) for i in value)
              sample_data[gt_field] = value
          except AttributeError:
              pass
@antonylebechec antonylebechec added the bug Something isn't working label May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants