ADSOrcid

Short summary

ORCID metadata enrichment pipeline - grabs claims from the API and enriches ADS storage/index.

How it works:

1. periodically check ADS API (using a special OAuth token that gives access to ORCID updates)
1. fetches the claims and puts them into the RabbitMQ queue
1. a worker grabs the claim and enriches it with information about the author (querying both
   public ORCID API for the author's name and ADS API for variants of the author name)
1. given the info above, it updates MongoDB (collection orcid_claims) - it marks the claim
   either as 'verified' (if it comes from a user with an account in BBB) or 'unverified'
   
   (it is the responsibility of the ADS Import pipeline to pick orcid claims and send them to
   SOLR for indexing)

Queues

check-orcidid: compares our stored version of the ORCID works against that of ORCID; records claims with appropriate statuses; sends tasks (individual claims) to record-claim queue
record-claim: receives single claim; checks existence of bibcode, updates claim with more author information; passes claim to match-claim queue
match-claim: verifies (or rejects) claims from record-claim, records approved claims; passes approved claims to output-results queue
output-results: sends results to another pipeline to be incorporated into the record
check-updates: checks ORCID microservice for updated profiles; if it finds any, sends them to check-orcidid

dev setup - vagrant (docker)

vim ADSOrcid/local_config.py #edit, edit
vagrant up db rabbitmq app
vagrant ssh app
cd /vagrant

This will start the pipeline inside the app container - if you have configured endpoints and access tokens correctly, it starts fetching data from orcid.

We are using 'docker' provider (ie. instead of virtualbox VM, you run the processes in docker). On some systems, it is necessary to do: export VAGRANT_DEFAULT_PROVIDER=docker or always specify `--provider docker' when you run vagrant.

The directory is synced to /vagrant/ on the guest.

dev setup - local editing

If you (also) hate when stuff is unnecessarily complicated, then you can also run/develop locally (using whatever editor/IDE/debugger you like)

virtualenv python
source python/bin/activate
pip install -r requirements.txt
pip install -r dev-requirements.txt
vagrant up db rabbitmq

This will setup python virtualenv and the database + rabbitmq. You can run the pipeline and tests locally.

RabbitMQ

vagrant up rabbitmq

The RabbitMQ will be on localhost:6672. The administrative interface on localhost:25672.

Database

vagrant up db

MongoDB is on localhost:37017, PostgreSQL on localhost:6432

production setup

vagrant up prod

It will automatically download/install the latest release from the github (no, not your local changes - only from github).

If you /ADSOrcid/prod_config.py is available, it will copy and use it in place of local_config.py

No ports are exposed, no SSH access is possible. New releases will deployed automatically.

Typical installation:

vim ADSOrcid/prod_config.py # edit, edit...
vagrant up prod

production setup - docker way

cd manifests/production/app
docker build --name ADSOrcid -t ADSOrcid .
cd ../../..
vim prod_config.py # edit, edit...
dockerun -d -v .:/vagrant/ --name ADSOrcid ADSOrcid /sbin/my_init

Here are some useful commands:

restart service

docker exec ADSOrcid sv restart app
tail log from one of the workers

docker exec ADSOrcid tail -f /app/logs/ClaimsImporter.log

Maintainers

Kelly, Roman

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
.github/workflows		.github/workflows
ADSOrcid		ADSOrcid
alembic		alembic
manifests		manifests
scripts		scripts
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vagrantfile		Vagrantfile
alembic.ini		alembic.ini
config.py		config.py
dev-requirements.txt		dev-requirements.txt
insecure_key		insecure_key
levenshtein_default.py		levenshtein_default.py
levenshtein_default_readme.txt		levenshtein_default_readme.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py

License

adsabs/ADSOrcid

Folders and files

Latest commit

History

Repository files navigation

ADSOrcid

Short summary

Queues

dev setup - vagrant (docker)

dev setup - local editing

RabbitMQ

Database

production setup

production setup - docker way

Maintainers

About

Resources

License

Stars

Watchers

Forks

Languages