Improved implementation of PrefixMapStd #1475

Aklakan · 2022-08-09T12:42:37Z

GitHub issue resolved #1474

Reimplementation of PrefixMapStd to combine "fast-track" with trie backing.

Tests are included. (All existing tests apply - no new ones created)
Documentation change and updates are provided for the Apache Jena website
Commits have been squashed to remove intermediate development commit messages.
Key commit messages start with the issue number (GH-xxxx or JENA-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.

See the Apache Jena "Contributing" guide.

Aklakan · 2022-08-24T19:38:48Z

This PR is ready for review.

The small extra changes are:
Longest-prefix-lookup-cache invalidation now only happens if there is an actual change; adding duplicates or removing non-existing entries do not trigger invalidation.
Also, I wasn't sure about the thread-safety of PrefixMapStd - probably its better to have it so I added RWL locking.

Is there a place where the utility methods calcWithLock(Lock, Supplier) and runWithLock(Lock, Runnable) could be publicly added - or maybe they already exist in one of the dependencies?

Aklakan · 2022-09-17T15:40:40Z

I converted back to draft because I have some pending updates which I need to benchmark against the well-working parts of the original PrefixMapStd (IRIs with / or #) first in order to determine whether performance-wise it would make sense to include them.

The general idea is to have PrefixMap implementation that auto-adapts to a given workload - such as parsing lots of queries without the overhead of updating reverse-lookup structures as well as updating them on-demand upon writing out RDF.

In essence the direction I am investigating is about building/updating the reverse-lookup (iri-to-prefix) structures lazily (upon abbreviating). This would buffer prefix-iri inserts/deletions similar to your BufferingPrefixMap; upon abbreviate only the delta is materialized into the reverse-lookup structures.

afs · 2022-09-17T16:11:23Z

Another choice is to restrict to the basic case of prefix at the final "/", "#" and ":" (for URNs). Only have the "fast path" abbreviate.

Do you have cases where abbreviation is not one of these?

If you are going for the complicated version,maybe the best way is to have a new PrefixMapCaching and leave PrefixMapStd.

Aklakan · 2022-09-19T17:15:52Z

Another choice is to restrict to the basic case of prefix at the final "/", "#" and ":" (for URNs). Only have the "fast path" abbreviate.

I think your suggestions of including ':' in the list and only using fast path (without resorting to scanning) would work efficiently and be sufficient for the vast majority of use cases. Without scanning, even relative IRIs (that do not contain any of the fast track chars) wouldn't cause problems.

Do you have cases where abbreviation is not one of these?

Right now I only have some initial experiments where I abuse the prefix map as a poor-mans dictionary encoding in order to reduce the amount of bytes that need to be parsed in order to produce triples/quads. For this I am using trie-based lookups to encode the data (so IRIs can be split anywhere), but I have yet to evaluate whether this actually gives a noticeable performance boost.

Aklakan mentioned this pull request Aug 9, 2022

PrefixMapStd is very slow for lookups that 'miss' #1474

Open

Aklakan force-pushed the gh-1474 branch 2 times, most recently from a422853 to bfa613e Compare August 24, 2022 18:59

Aklakan force-pushed the gh-1474 branch from de369dc to e22cee6 Compare August 24, 2022 19:50

Improved implementation of PrefixMapStd

0c6f0b8

Aklakan force-pushed the gh-1474 branch from b94ec8c to 0c6f0b8 Compare August 30, 2022 13:20

Aklakan marked this pull request as draft September 17, 2022 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved implementation of PrefixMapStd #1475

Improved implementation of PrefixMapStd #1475

Aklakan commented Aug 9, 2022 •

edited

Aklakan commented Aug 24, 2022 •

edited

Aklakan commented Sep 17, 2022 •

edited

afs commented Sep 17, 2022

Aklakan commented Sep 19, 2022 •

edited

Improved implementation of PrefixMapStd #1475

Are you sure you want to change the base?

Improved implementation of PrefixMapStd #1475

Conversation

Aklakan commented Aug 9, 2022 • edited

Aklakan commented Aug 24, 2022 • edited

Aklakan commented Sep 17, 2022 • edited

afs commented Sep 17, 2022

Aklakan commented Sep 19, 2022 • edited

Aklakan commented Aug 9, 2022 •

edited

Aklakan commented Aug 24, 2022 •

edited

Aklakan commented Sep 17, 2022 •

edited

Aklakan commented Sep 19, 2022 •

edited