Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gt gff3 intron request #936

Open
PlantDr430 opened this issue Jan 26, 2020 · 3 comments
Open

gt gff3 intron request #936

PlantDr430 opened this issue Jan 26, 2020 · 3 comments

Comments

@PlantDr430
Copy link

PlantDr430 commented Jan 26, 2020

Hello,

It would be helpful if when adding introns to a gff3 it would also add ID's instead of just Parent's. Similar to how exons and cds's have ID's associated with their parent.

I am using the -retainids -addintrons flags.

Currently, the results look like:

###
1	funannotate	gene	12845	16741	.	+	.	ID=CPUR_00006
1	funannotate	mRNA	12845	16741	.	+	.	ID=CPUR_00006-T1;Parent=CPUR_00006;product=hypothetical protein
1	funannotate	exon	12845	13792	.	+	.	ID=CPUR_00006-T1.exon1;Parent=CPUR_00006-T1
1	funannotate	CDS	12845	13792	.	+	0	ID=CPUR_00006-T1.cds;Parent=CPUR_00006-T1
1	funannotate	intron	13793	13890	.	+	.	Parent=CPUR_00006-T1
1	funannotate	exon	13891	14691	.	+	.	ID=CPUR_00006-T1.exon2;Parent=CPUR_00006-T1
1	funannotate	CDS	13891	14691	.	+	0	ID=CPUR_00006-T1.cds;Parent=CPUR_00006-T1
1	funannotate	intron	14692	15817	.	+	.	Parent=CPUR_00006-T1
1	funannotate	exon	15818	16741	.	+	.	ID=CPUR_00006-T1.exon3;Parent=CPUR_00006-T1
1	funannotate	CDS	15818	16741	.	+	0	ID=CPUR_00006-T1.cds;Parent=CPUR_00006-T1

It would be very helpful if the introns could be returned as such.


1	funannotate	intron	13793	13890	.	+	.	ID=CPUR_00006-T1.in1;Parent=CPUR_00006-T1
1	funannotate	intron	14692	15817	.	+	.	ID=CPUR_00006-T1.in2;Parent=CPUR_00006-T1
@satta
Copy link
Member

satta commented Feb 9, 2020

Thanks for your feature request. It is very specific, given that IDs are usually only used in GenomeTools to enable Parent-child links and do not carry meaning. The -retainids option is indeed a compromise to not lose data in case IDs actually already contain information.
I think it would not be difficult to implement an ID assignment scheme as you are proposing it, but it would need to be restricted to the use case where indeed -retainids and -addintrons as well as something like -intronids is set.

@maol-corteva
Copy link

Perhaps a suggestion...
What about having a generic flag that adds new IDs to children nodes (-retainids -addmissingids) that don't have IDs even if they are childless? I know this is not the original intention of the ID flag (only purpose of having an ID was to be able to have its children linked to it). However I have run into external tools that rely on these IDs (when importing GFF annotations) as DB index keys (or primary keys) to all features and children.

@satta
Copy link
Member

satta commented May 25, 2021

So you are just referring to leaf nodes, because for all internal nodes would always get an ID for connecting children?
Please be aware that these would then not necessarily follow the naming scheme that you are trying to keep with -retainids as we can't predict or interpret how these would be formed (the current automatic ID generation IIRC just produces an ID from type plus incremented number).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants