Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gt gff3 loss of intergenic regions #1022

Open
edinatale opened this issue Apr 12, 2023 · 1 comment
Open

gt gff3 loss of intergenic regions #1022

edinatale opened this issue Apr 12, 2023 · 1 comment
Assignees

Comments

@edinatale
Copy link

edinatale commented Apr 12, 2023

I am interested in adding introns in my genome annotation mygenome.gff3.
After running the command below, I obtain an output genome_with_introns.gff3 in which there is no "intergenic_region" annotation anymore.
Is there something I'm not considering, that would help me solving this issue?

mygenome.gff3

##gff-version 3
##sequence-region chr_00 1 18946431
chr_00	assembly	sequence_assembly	1	18946431	.	+	.	ID=chr_00;Ontology_term=SO:0000353
chr_00	ORCAE	intergenic_region	1	149	.	.	.	ID=inter:first;Name=first;Ontology_term=SO:0000605
chr_00	ORCAE	gene	150	6731	.	-	.	ID=Ec-00_000010;Name=Ec-00_000010;Alias=Esi_1000_0001,Esi1000_0001;length=6582
chr_00	ORCAE	mRNA	150	6731	.	-	.	ID=Ec-00_000010.1;Parent=Ec-00_000010;gene_id=Ec-00_000010.1;Name=Ec-00_000010.1;Alias=Esi_1000_0001,Esi1000_0001;length=1971
chr_00	ORCAE	exon	150	428	.	-	.	ID=Ec-00_000010.1.10;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000202
chr_00	ORCAE	exon	898	1100	.	-	.	ID=Ec-00_000010.1.9;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00	ORCAE	exon	1536	1674	.	-	.	ID=Ec-00_000010.1.8;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00	ORCAE	exon	2092	2268	.	-	.	ID=Ec-00_000010.1.7;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004

my command

gt gff3 -retainids -addintrons my_genome.gff3  > my_genome_with_introns.gff3

genome_with_introns.gff3

chr_00  ORCAE   gene    150     6731    .       -       .       ID=Ec-00_000010;Name=Ec-00_000010;Alias=Esi_1000_0001,Esi1000_0001;length=6582
chr_00  ORCAE   mRNA    150     6731    .       -       .       ID=Ec-00_000010.1;Parent=Ec-00_000010;gene_id=Ec-00_000010.1;Name=Ec-00_000010.1;Alias=Esi_1000_0001,Esi1000_0001;length=1971
chr_00  ORCAE   exon    150     428     .       -       .       ID=Ec-00_000010.1.10;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000202
chr_00  ORCAE   CDS     150     428     .       -       0       ID=CDS:Ec-00_000010.1;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Name=Ec-00_000010.1;Ontology_term=SO:0000202
chr_00  .       intron  429     897     .       -       .       Parent=Ec-00_000010.1
chr_00  ORCAE   exon    898     1100    .       -       .       ID=Ec-00_000010.1.9;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00  ORCAE   CDS     898     1100    .       -       2       ID=CDS:Ec-00_000010.1;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Name=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00  .       intron  1101    1535    .       -       .       Parent=Ec-00_000010.1
chr_00  ORCAE   exon    1536    1674    .       -       .       ID=Ec-00_000010.1.8;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004

gt -version 1.6.1
Linux x86_64

@satta
Copy link
Member

satta commented Apr 16, 2023

Unfortunately I can not reproduce this:

$ gt -version | head -n 1
gt (GenomeTools) 1.6.1
$ cat test1022.gff3 
##gff-version 3
##sequence-region chr_00 1 18946431
chr_00	assembly	sequence_assembly	1	18946431	.	+	.	ID=chr_00;Ontology_term=SO:0000353
chr_00	ORCAE	intergenic_region	1	149	.	.	.	ID=inter:first;Name=first;Ontology_term=SO:0000605
chr_00	ORCAE	gene	150	6731	.	-	.	ID=Ec-00_000010;Name=Ec-00_000010;Alias=Esi_1000_0001,Esi1000_0001;length=6582
chr_00	ORCAE	mRNA	150	6731	.	-	.	ID=Ec-00_000010.1;Parent=Ec-00_000010;gene_id=Ec-00_000010.1;Name=Ec-00_000010.1;Alias=Esi_1000_0001,Esi1000_0001;length=1971
chr_00	ORCAE	exon	150	428	.	-	.	ID=Ec-00_000010.1.10;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000202
chr_00	ORCAE	exon	898	1100	.	-	.	ID=Ec-00_000010.1.9;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00	ORCAE	exon	1536	1674	.	-	.	ID=Ec-00_000010.1.8;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00	ORCAE	exon	2092	2268	.	-	.	ID=Ec-00_000010.1.7;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
$ gt gff3 -retainids -addintrons test1022.gff3 > test1022_with_introns.gff3
$ cat test1022_with_introns.gff3
##gff-version 3
##sequence-region   chr_00 1 18946431
chr_00	assembly	sequence_assembly	1	18946431	.	+	.	ID=chr_00;Ontology_term=SO:0000353
###
chr_00	ORCAE	intergenic_region	1	149	.	.	.	ID=inter:first;Name=first;Ontology_term=SO:0000605
###
chr_00	ORCAE	gene	150	6731	.	-	.	ID=Ec-00_000010;Name=Ec-00_000010;Alias=Esi_1000_0001,Esi1000_0001;length=6582
chr_00	ORCAE	mRNA	150	6731	.	-	.	ID=Ec-00_000010.1;Parent=Ec-00_000010;gene_id=Ec-00_000010.1;Name=Ec-00_000010.1;Alias=Esi_1000_0001,Esi1000_0001;length=1971
chr_00	ORCAE	exon	150	428	.	-	.	ID=Ec-00_000010.1.10;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000202
chr_00	.	intron	429	897	.	-	.	Parent=Ec-00_000010.1
chr_00	ORCAE	exon	898	1100	.	-	.	ID=Ec-00_000010.1.9;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00	.	intron	1101	1535	.	-	.	Parent=Ec-00_000010.1
chr_00	ORCAE	exon	1536	1674	.	-	.	ID=Ec-00_000010.1.8;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
chr_00	.	intron	1675	2091	.	-	.	Parent=Ec-00_000010.1
chr_00	ORCAE	exon	2092	2268	.	-	.	ID=Ec-00_000010.1.7;Parent=Ec-00_000010.1;gene_id=Ec-00_000010.1;Ontology_term=SO:0000004
###

As you can see, both the intergenic_region as well as assembly features are still there, which you are reporting missing in your output.

Your result GFF looks a little weird as well -- there is no ##sequence-region directive, for example, which GenomeTools always outputs. This does not look like direct GenomeTools output. Has the output been postprocessed in any way, e.g. piped through another tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants