Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves affiliation capture from Crossref #105

Merged
merged 1 commit into from
May 8, 2024

Conversation

seasidesparrow
Copy link
Member

modified:   adsingestp/parsers/crossref.py
modified:   tests/stubdata/output/crossref_cn_10.1093=mnras=stac2975.json
modified:   tests/stubdata/output/crossref_cn_10.1093=pasj=psac053.json

This PR addresses Issue #96 . The broader issue of capturing ROR and other affiliation identifier data is raised in Issue #104 and will be the subject of a separate PR.

 	modified:   adsingestp/parsers/crossref.py
 	modified:   tests/stubdata/output/crossref_cn_10.1093=mnras=stac2975.json
 	modified:   tests/stubdata/output/crossref_cn_10.1093=pasj=psac053.json
Copy link
Contributor

@mugdhapolimera mugdhapolimera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. Just a small question about the use of find vs find_all

@@ -265,6 +265,30 @@ def _parse_contrib(self):
affil = [a.get_text() for a in c.find_all("affiliation")]
if affil:
contrib_tmp["aff"] = affil
elif c.find("affiliations"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a situation where there is more than one affiliations tag for a single author? Try to check if there is a use case for find_all rather than find here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every publisher uses the structure; Elsevier is an example of one that does not. OUP and MDPI both do, and none of their crossref records contain more than one <affiliations> tag per author, so using find should be sufficient

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Copy link
Contributor

@mugdhapolimera mugdhapolimera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks god. Feel free to merge

@seasidesparrow seasidesparrow merged commit 6e254d1 into adsabs:main May 8, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants