Skip to content

WormBase/names

Repository files navigation

wormbase-names

Contents

General overview

The main functionality of this repository is to provide a web service that facilitates the sharing of identifiers and names of a subset of WormBase data types.
The web service comprises of:

  • A REST API (+ swagger documentation) for reading and manipulating data from the service (full CRUD support). Main code to be found here.
  • A Web interface, providing forms to perform operations via the REST API.

This repository also contains:

More general code features are:

  • User authentication (against the wormbase.org google organisation)
  • Provenance provisioning (who, when, where and why) with every write operation, modelled as attributes on the "transaction entity" in Datomic.
  • Schema and database related functions
    The schema and related database functions are intended to evolve to eventually become the "central" database within the WormBase architecture.
  • Serialisation of events to a queueing system, such that those events can be "replayed" into various ACeDB databases.

Authentication

Login to the Name Service happens through your *@wormbase.org Google account. Hit the "Login With Google" button and select the correct account or fill in the required details.

For more details on the Authentication and how to obtain a token for API authentication, see the Google-Auth docs.

Development

Coding style

The coding style for all clojure code in this project tries to adhere to both the Clojure style guide and how to ns, both of which are supported by the source code linter clj-kondo.

To run the source coder linter:

for d in src test; do
  clj -A:clj-kondo --lint $d
done

Requirements

Ensure you've installed the following software on your system to enable local building, testing and troubleshooting:

Docker credentials

The Makefile target ecr-login command will, by default, store the authentication token un-encrypted in the file: ~/.docker/config.json.

There is a plugin that can be used to save these tokens encrypted in a store, but varies depending on operating system.

For linux, there's docker-credential-pass and pass, which can be used together, which uses a GPG2 key to encrypt tokens.

Google OAuth 2.0 Client secrets

On execution (both UI and API), the application requires a set of environment variables to be set in order to be able to correctly authenticate through the Name Service application's Google Oauth 2.0 Client.

The Name service OAuth 2.0 Clients can be found on the wormbase-names-service google console credentials page. Under OAuth 2.0 Client IDs:

  • Click WormBase Names Service (Web - Dev) for details used for local development and testing
  • Click WormBase Names Service (Web - Prod) for details used for deployments to AWS (both test and prod environments)

The respective OAuth 2.0 Clients' client-id and client-secrets are stored in AWS SSM, and automatically retrieved through the google-oauth2-secrets and set through the execution and deployment targets in the Makefile. Ensure the correct AWS_PROFILE or other relevant AWS environment variables are set and exported to allow access to those SSM parameters before executing any of the execution or deployment targets using make.

REST API

To be able to run the REST API locally, define the (local) datomic DB URI as the env variable WB_DB_URI, and the URI to use during Google authentication as the env variable GOOGLE_REDIRECT_URI.

An example of a valid datomic URI may be datomic:mem://localhost:4334/names. No transactor setup is needed for this in-memory database URI. For a persistent database (like ddb-local), a transactor needs to be configured, in which case the WB_DB_URI is based on your transactor configuration and database name. Make sure to define the DATOMIC_EXT_CLASSPATH env variable to point to the wormbase/ids jar when setting up the transactor (see these instruction to build the ids jar).

export DATOMIC_EXT_CLASSPATH="$HOME/git/wormbase-names/ids/target/wbids.jar"

When using a ddb-local transactor, ensure to have set AWS environment variables with mock credentials, then run the following command to launch the local REST API service:

make run-dev-webserver PORT=[port] WB_DB_URI=[datomic-uri] GOOGLE_REDIRECT_URI="http://lvh.me:3000"

To allow the UI webpackDevServer to proxy to the ring server, the ring server has to be run at the host and port configured in the "proxy" section in client/package.json (standardly 4010 is used).

When the API is running locally, the API documentation should be available at http://localhost:4010/api-docs/.

Tools

Running a Clojure REPL

Examples

  • Emacs + CIDER :
# Example. `:mvn/version` of nrepl changes frequently, CIDER/emacs will prompt when upgrade required.
clj -A:datomic-pro:webassets:dev -Sdeps '{:deps {cider/cider-nrepl {:mvn/version "0.23.0"}}}' -m nrepl.cmdline --middleware "[cider.nrepl/ cider-middleware]"
  • "Vanilla" REPL:
clj -A:datomic-pro:webassets:dev -m nrepl.cmdline

From time to time it is good to check for outdated dependencies. This can be done via the following command:

clj -A:outdated

Debug printing

Standard logging is done using ch.qos.logback/logback (-classic and -core), but this is not flexible for limited-scale (local) debug printing during coding and debugging. To enable more flexible debug printing:

  1. In the file you want to enable temporary debug printing, replace the loading of the clojure.tools.logging library with the taoensso.timbre library instead (using the same :log alias).

  2. Add the following line below the library loading section

    (log/merge-config! {:level :debug})

    This will enable all log printing up to debug level within that file, but leave the application-default logging outside of it, enabling a more focused log inspection.

  3. Add any additional debug logging with the standard (log/debug "string" object)

taoensso.timbre will by default print logs to the following format:

<date> <timestamp> <device-name> <LOGLEVEL> [<namespace>] - <log message>

Client app (web interface)

Correct functionality of the client app can be tested in two ways:

  • Running a client development server (during development), to test individual functionality. See instructions below.
  • Making a production build of the client app. Failure during this process means fixes will be needed before deployment.
# performs a npm clean install of dependencies based on package-lock.json
make build-ui

To start up a local client development server:

  1. Ensure the back-end application is running and an API endpoint is available locally (see above)

  2. Ensure client/package.json has proxy configured to point at the backend API, at the correct port (default 4010).

  3. Ensure the correct AWS_PROFILE or other relevant AWS environment variables are set and exported to allow access to the necessary google client secrets (stored as AWS SSM parameters).

  4. Run (bash):

    make run-dev-ui
  • This will start service serving the client assets on port 3000.
  1. Finally, ensure the authentication callback URL at Google Cloud Console is configured to match the client development server configuration. Under OAuth 2.0 Client IDs, click "WormBase Names Service (Web - Dev)" and have a look at the "Authorized JavaScript origins" section.

Notes:

  • Node.js and NPM
    • This client requires compatible versions of node.js and NPM, as specified in the engines property package.json. The easiest way to use the right version of node.js and NPM, is through the Node Version Manager (nvm).
    • To invoke nvm use automatically, setup Deeper Shell Integration by following the nvm documentation.
  • Create-React-App
    • client/ is bootstrapped with create-react-app, where you can find out more about its setup and capability
  • Port:
    • To run the client on a different port:
PORT=[PORT] npm run start
  • Dependencies:
    • Most errors about missing dependencies can be resolved with npm install, which installs dependencies into the ./node_modules directory. It's safe to delete the content of ./node_modules and/or re-run npm install.
    • Be sure to checking in changes in package-lock.json, which specifies the exact versions of npm packages installed, and allows package installation to happen in a reproducible way based on package-lock.json with npm ci.
  • Mock:
    • Ajax calls through mockFetchOrNot function allows one to provide a mock implementation of an API call, in addition to the native API call.
    • Whether the mock implementation or the native implementation is invoked is determined by the 3rd argument (shouldMock) passed to mockFetchOrNot function.
    • shouldMock defaults to the REACT_APP_SHOULD_MOCK environment variable, when it's not passed in as an argument.
  • Directory structure
    • create-react-app is responsible for the directory structure of client/ except client/src, and relies it staying this way.
    • client/src primarily consists of
      • containers: React components involving business logic
      • components/elements: React components involving only appearance and/or UI logic

Testing

Use built-in testing utilities as provided by your environment, else use the make command below to run all tests. Ensure to run all tests and check they pass before committing large code changes, before submitting new pull requests and before deploying to any live AWS environment (test or production).

make run-tests

Release & deployment

As described in the intro, the name service exists of several components, for which release versioning and deployment steps differ:

  • Main application (REST API + web client)
    • Versioned through the repository git tags
    • Deployed through AWS EB (& ECR)
  • IDs clojure library
    • Manually versioned through the ids/pom.xml file (and clojars)
    • Library deployed to clojars (thin jar)
    • Datomic transactors (which use this library) deployed through AWS CloudFormation
  • Export package
    • Not versioned
    • Deployed to S3 (uber jar)

When release & deployment is required to both the IDs library and the main application, the correct order of deployment is to deploy the IDs library first, then update the transactors and lastly the main application.

Requirements

Ensure you've installed the following software on your system to enable building, testing and deployment:

Deploying the application (REST API + client)

First time setup

Before being able to deploy for the first time (after creating a new local clone of the repository), a local EB environment must be configured.

The --profile is optional, but saves a default profile, which prevents you from having to provide your profile name as input argument or bash environment variable on every EB operation (if it's not "default").

eb init [--profile <aws-profile-name>]

This command will interactively walk you through saving some EB configurations in the .elasticbeanstalk directory. Provide the following parameters when asked for:

  • Default region: us-east-1
  • Application to use: names
  • Default environment: wormbase-names-test (this prevents accidental deployement to the production environment)
  • CodeCommit?: N

Update release & deployment

Deploying the main application is a 3 step process:

  1. Release code - revision, push.
  2. Build application and deploy in the AWS Elastic Container Registry (ECR).
  3. Deploy the application in AWS ElasticBeanstalk.

The release and deployment process heavily uses make for its automation. For a full list of all available make commands, type:

make help

To deploy an update for the main application, change your working dir to the repository root dir and execute the following commands (bash):

# Specify $LEVEL as one of <major|minor|patch>.
# This will bump the x, y or z version number.
# SLF4J messages can be ignored (warnings, not errors).
# Clashing jar warnings can be ignored.
make vc-release LEVEL=$LEVEL

# print the version being deployed and confirm it's correctness (e.g. prevent DIRTY deployments to production)
make show-version

# Once confirmed to be correct, push the created tag to github
git push --follow-tags

# Before building the application, ensure docker (daemon) is running.
# If not, start it. On Ubuntu you can do so with the following cmd:
sudo service docker start

# Build the application and deploy the docker image to the AWS Elastic Container Registry (ECR)
# NOTE: To deploy a tagged or branched codeversion that does not equal your (potentially dirty) working-dir content,
#       use the additional argument REF_NAME=<ref-name>
#       E.g. make release AWS_PROFILE=wormbase REF_NAME=wormbase-names-1.4.7
make release [AWS_PROFILE=<profile_name>] APP_PROFILE=prod

# Deploy the application to an EB environmnent.
# Before execution:
# * Ensure to specify the correct EB environment name,
#   (otherwise deployment to non-existing dev environment will be attempted)
# * Check if the hard-coded WB_DB_URI default (see MakeFile) applies.
#   If not, define WB_DB_URI to point to the appropriate datomic DB.
# * Ensure to define the correct GOOGLE_REDIRECT_URI for google authentication (http://lvh.me:3000 when developing locally)
# Executing this make target will automatically set the required execution environment variables
# for Google Oauth2 authentication, through EB (retrieved from AWS SSM).
make eb-deploy PROJ_NAME=<env-name> [GOOGLE_REDIRECT_URI=<google-redirect-uri>] [WB_DB_URI=<datomic-db-uri>] [AWS(_EB)?_PROFILE=<profile_name>]

Deploying the IDs library

For instruction about developing, building and deploying the IDs library sub-project, see the sub-project's README.

Other tasks

Importing GeneACe export data in datomic DB

Conventionally, the export files have been named in the form: DDMMYYY_<topic>, and we give the datomic database a corresponding name.

The best way to run the imports is against a local datomic:ddb-local or datomic:dev transactor.

e.g: dynamodb-local

export WB_DB_URI="datomic:ddb-local://localhost:8000/WSNames/12022019 # The Dynamo DB table here is `WSNames`

See here for instructions on creating a local DynamoDB database.

Import genes

This import pipeline takes two files:

  • <current_status_tsv> : A file containing the columns: WBGeneID, Species, Status, CGC Name, Sequence Name, Biotype.
  • <actions_tsv>: A file containing WBGeneID, WBPersonID, Timestamp, Event Type
clojure -A:dev:datomic-pro -m wormbase.names.importer \
 gene <current_status_tsv> <actions.tsv>

At time of writing (as of WormBase release WS270), the gene import pipeline takes ~5 hours to run.

Import variations

The variations export data is provided in a single file (no provenance is attached).

clojure -A:dev:datomic-pro -m wormbase.names.importer variation <variations_tsv>

At time of writing (as of WormBase release WS270), the variations import pipeline takes ~5 mins to run.

Import Sequence Features

We do not attempt to replay all Sequence features from an export, and instead just record the latest ID and status.

From a fresh database install, enter the following in a REPL session after exporting the WB_DB_URI environment variable appropriately:

(require '[environ.core :refer [env]])
(require '[datomic.api :as d])
(def conn (d/connect (:wb-db-uri conn)))
@(d/transact conn [{:sequence-feature/id "<latest-id>", :sequence-feature/status :sequence-feature.status/live}
				  {:db/id "datomic.tx", :provenance/why "Initial import", :provenance/who [:person/id "YourWBPersonID"]}])

Restoring the datomic database to AWS

Creation of a new remote DynamoDB database should be done via the AWS CLI or web console (outside of the scope of this document).

Follow the "standard" backup-and-restore method, for example:

mkdir $HOME/names-db-backups
cd ~/datomic-pro/datomic-pro-0.9.5703
bin/datomic backup-db $LOCAL_DATOMIC_URI file://$HOME/names-db-backups/names-db

Before restoring the database:

  • Make a note of the current value of write-capacity
  • Increase the write-capacity of the DDB table via the AWS CLI/web console to be 1000 (or more), then run the restore command shown below.
bin/datomic restore-db file://$HOME/names-db-backups/names-db $REMOTE_DATOMIC_URI

After the process concludes, restore the write-capacity back to its original value.

Ensure to configure the application via the .ebextensions/app-env.config file to match $REMOTE_DATOMIC_URI. After deploying a release, verify that the URI has changed in the ElasticBeanStalk configuration section.

Exporting names data to CSV

The primary function of the export output is for reconcilation of the datomic names db against an ACeDB database.
The IDs, names and status of each entity in the database are output as CSV.

A jar file is deployed to the WormBase S3 bucket (s3://wormbase/names/exporter/wb-names-export.jar) for convenience.

# export genes
java -cp <path-to-wormbase-names-export.jar> clojure.main -m wormbase.names.export genes /tmp/genes.csv

# export variations
java -cp <path-to-wormbase-names-export.jar> clojure.main -m wormbase.names.export variations /tmp/variations.csv

The exporter can also be run from a checkout of this repository:

cd <wormbase-names_checkout>/export

# export genes
clojure -A:dev:datomic-pro -m wormbase.names.export genes  /tmp/genes.csv

# export variations
clojure -A:dev:datomic-pro -m wormbase.names.export variations  /tmp/variations.csv

License

EPL (Eclipse Public License)
Copyright © WormBase 2018, 2019