Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read gene names from postgres pap_gene table instead of Textpresso categories #38

Open
valearna opened this issue Feb 25, 2021 · 1 comment

Comments

@valearna
Copy link
Collaborator

Textpresso categories queries are too slow, and pap_gene already contains a lot of gene names. Future automated extraction pipelines should make pap_gene contain gene names matching those extracted by tpc

@valearna
Copy link
Collaborator Author

For now we can continue to use the list of genes from textpresso, but in case of papers with a high number of genes (#genes in paper / total # c. elegans genes), we could remove genes mentioned only once. This would take care of high throughput experiments. Reading genes from postgres would still be faster, but we need to wait for a pipeline that is able to extract genes from full text and not only abstracts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant