Read gene names from postgres pap_gene table instead of Textpresso categories #38

valearna · 2021-02-25T22:59:38Z

Textpresso categories queries are too slow, and pap_gene already contains a lot of gene names. Future automated extraction pipelines should make pap_gene contain gene names matching those extracted by tpc

valearna · 2021-03-25T21:28:35Z

For now we can continue to use the list of genes from textpresso, but in case of papers with a high number of genes (#genes in paper / total # c. elegans genes), we could remove genes mentioned only once. This would take care of high throughput experiments. Reading genes from postgres would still be faster, but we need to wait for a pipeline that is able to extract genes from full text and not only abstracts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read gene names from postgres pap_gene table instead of Textpresso categories #38

Read gene names from postgres pap_gene table instead of Textpresso categories #38

valearna commented Feb 25, 2021

valearna commented Mar 25, 2021

Read gene names from postgres pap_gene table instead of Textpresso categories #38

Read gene names from postgres pap_gene table instead of Textpresso categories #38

Comments

valearna commented Feb 25, 2021

valearna commented Mar 25, 2021