Skip to content

A custom updater wrapping the regular Wikidata updater for updating multiple sites in multiple namespaces.

License

Notifications You must be signed in to change notification settings

wbstack/queryservice-updater

Repository files navigation

ℹ️ Issues for this repository are tracked on Phabricator - (Click here to open a new one)

Try to push this upstream somehow...

https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/589408/

The idea of speeding up the updater from version 1

The idea of this is to avoid the JVM startup etc, and basically have an updater running all the time rather than shelling out which is what the first version glue did.

I also investigated nailgun, but its not really maintained anymore so probably not a good direction to go in.

This can be trialed at https://repl.it/languages/java Just copy the code from the class into the default Main.java there and run the 2 lines below.

javac -classpath .:/run_dir/junit-4.12.jar:target/dependency/* -d . Main.java
java -classpath .:/run_dir/junit-4.12.jar:target/dependency/* org.wikidata.query.rdf.tool.WbStackUpdate --sparqlUrl sparql

This updater approach would require minimum changes to query service code and use the WHOLE of the current updater. Pulls of new code would be easy, the only thing we would need to look out for are changes to the params that are passed into the updater that we manipulate. But the runUpdater.sh could be altered to take a main class from an ENV var, and voila!

TODO check with wdqs team about if there are any wdqs internals I'll mess up by doing this & get their general thoughts.

Running locally

  1. In IntelliJ IDEA, create a run configuration for org.wikidata.query.rdf.tool.WbStackUpdate
  2. Set VM Options to -Xmx64m for example
  3. Set environment values as:
WBSTACK_API_ENDPOINT=http://localhost:3030/
WBSTACK_BATCH_SLEEP=0
WBSTACK_LOOP_LIMIT=1000000000
WBSTACK_WIKIBASE_SCHEME=http
  1. Start docker with docker-compose up
  2. As everything has initialized you should be able to run the new configuration.
  3. Every time the fake api gets polled new items will get inserted into wikibase, and the updater will keep running indefinitely.
  4. (Optional) https://visualvm.github.io/ for profiling

Github Actions Test CI

The test CI is running a wikibase instance that gets populated by the seeder/ scripts, after some passes of the queryservice-updater, the queryservice is queried for any inserted rows.

When debugging the CI configuration locally you can run

docker-compose -f docker-compose.yml -f docker-compose.ci.yml up

If changes aren't taking effect you can try removing the image to force a rebuild

docker rmi queryservice-updater_wdqs-updater