Skip to content

Commit

Permalink
Update harvest_solr.py to be robust when abstracts are missing
Browse files Browse the repository at this point in the history
  • Loading branch information
Thomas-S-Allen committed Mar 13, 2024
1 parent 04e2bd7 commit 5d390b8
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion harvest_solr.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,17 @@ def transform_r_json(r_json):
"""

# extract the needed information
# Bibcoded and titles are always present
bibcodes = [doc['bibcode'] for doc in r_json['response']['docs']]
titles = [doc['title'][0] for doc in r_json['response']['docs']] # without [0] it returns a list
abstracts = [doc['abstract'] for doc in r_json['response']['docs']]
# abstracts = [doc['abstract'] for doc in r_json['response']['docs']]
# Abstracts are not always present
abstracts = []
for doc in r_json['response']['docs']:
if 'abstract' in doc:
abstracts.append(doc['abstract'][0])
else:
abstracts.append('')

# list of dictionaries with the bibcode, title, and abstract for each record
record_list = [{'bibcode': bibcodes[i],
Expand Down

0 comments on commit 5d390b8

Please sign in to comment.