Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasoning using the semi naive algorithm #295

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Conversation

jeswr
Copy link
Collaborator

@jeswr jeswr commented May 7, 2022

Note - the code is nasty at the moment and needs some love and refactoring, but it does work!

@josd @pbonte - alternative to #293 now implementing a better forward chaining algorithm.

Somewhat surprisingly it now takes longer to do RDFS rules over the union of timbl's profile card, and the foaf ontology (which is not deep at all, you only need one round of reasoning).

It is far more competitive on the Deep Taxolonomy Benchmark now @josd - this is the output of running the Reasoning-perf.js file in this branch. I have not benchmarked test-dl-1000000.n3 as I could not download this from sourceforge.

Reasoning over TimBL profile and FOAF
loading foaf ontology: 28.07ms
loading tim berners lee profile card: 17.824ms
apply reasoning: 45.178ms


Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 2.809ms
Reasoning: test-dl-10.n3: 0.778ms
true

Load test-dl-100.n3: 13.661ms
Reasoning: test-dl-100.n3: 2.647ms
true

Load test-dl-1000.n3: 68.42ms
Reasoning: test-dl-1000.n3: 25.906ms
true

Load test-dl-10000.n3: 289.421ms
Reasoning: test-dl-10000.n3: 113.692ms
true

Load test-dl-100000.n3: 1.799s
Reasoning: test-dl-100000.n3: 675.589ms
true

@jeswr jeswr marked this pull request as draft May 7, 2022 04:17
@pbonte
Copy link

pbonte commented May 7, 2022

Nice results @jeswr!
Could you summarise what optimisations you conducted to increase performance here?
Love to hear more!

@jeswr
Copy link
Collaborator Author

jeswr commented May 7, 2022

Could you summarise what optimisations

Sure! The key point is to pre-compute as much information as possible from the rules before going to the data, in particular, determining which set of indexes to use for each premise based on which terms in rules are variables and which variables will be substituted in by the time we look at a particular term ins a premise

To improve over #294 - I now pre-compute the rule dependencies, so whenever a new implicit fact is found we can substitute the new fact into 'dependent rules' to create restricted rules that we then need to check.

I'll clean this up so this is easier to understand - but this may not be until later on in the month depending on how I track with other commitments.

@rubensworks
Copy link
Member

This is really exciting @jeswr, awesome work!

So is this implementation N3-specific? Or could it also be generalized to run with the same level of performance over other RDF/JS Sources?

@jeswr
Copy link
Collaborator Author

jeswr commented May 7, 2022

N3-specific?

This is N3 specific as it leverages facts about the internal indexes, and also benefits from not having to convert between the internal IDs, and RDFJS term representations.

I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.

In terms of generalising to other sources @rubensworks, I have an unpublished forward chaining algorithm for Comuncia that uses similar ideas to what I have done here.

@rubensworks
Copy link
Member

I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.

Right, that makes sense.

At some point in time, I would love us to get Comunica to work directly on these internal indexes (comunica/comunica#873), as this would make these kinds of use cases a lot faster.

@jeswr
Copy link
Collaborator Author

jeswr commented May 7, 2022

Comunica to work directly on these internal indexes

Indeed - that was the motivation for #289 as well

@rubensworks
Copy link
Member

Indeed - that was the motivation for #289 as well

Oh, interesting, I missed that issue.

@josd
Copy link

josd commented May 7, 2022

Thanks @jeswr for keeping the deep taxonomy ball rolling 😉

For the moment however I can't reproduce your performance results

$ node Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 21.521ms
loading tim berners lee profile card: 18.742ms
apply reasoning: 33.002ms


Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 2.849ms
Reasoning: test-dl-10.n3: 0.912ms
true

Load test-dl-100.n3: 10.211ms
Reasoning: test-dl-100.n3: 38.758ms
true

Load test-dl-1000.n3: 59.463ms
Reasoning: test-dl-1000.n3: 3.490s
true

Load test-dl-10000.n3: 219.026ms
Reasoning: test-dl-10000.n3: 3:29.937 (m:ss.mmm)
true

Load test-dl-100000.n3: 2.063s
^C

So I ^C'd the depth of 100000 as I guess it would have taken 3.5 hours.

Just to make sure that we are on the same page, what I have is

$ git branch
  feat/reasoning-fwd-chain
  master
* semi-naive

$ git log
commit 9c686a3f5f0be052d84750aa932873245684c820 (HEAD -> semi-naive, origin/semi-naive)
Author: Jesse Wright <jesse.wright@anu.edu.au>
Date:   Sat May 7 14:16:21 2022 +1000

    chore: clean up output

commit ef6ba01cc3a6ed3b1508a4c471fdb153c8a7dfa7
Author: Jesse Wright <jesse.wright@anu.edu.au>
Date:   Sat May 7 14:08:47 2022 +1000

    chore: add deep taxonomy benchmark

and running on

======== OS ========

Linux 4.19.104-microsoft-standard x86_64 GNU/Linux

======== CPU ========

processor       : 0
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 1
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 2
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 3
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 4
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 5
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 6
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 7
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

======== RAM ========

MemTotal:       12911996 kB

@jeswr
Copy link
Collaborator Author

jeswr commented May 7, 2022

I can't reproduce your performance results

Weird @josd ... I have cloned this to a new location and managed to reproduce the results I initially posted. You seem to be working from the right commit.

A couple of potential things causing what you are seeing
a) If you are checking out the existing clone, then try running npm install && npm run build again as it may be running off an old build of the file
b) Check your version of node (I'm using v18.1)
c) You have changed/added more rules to the tests (perhaps I am misunderstanding which rules are expected to be used for this benchmark)

For completeness, my OS specs are

jeswr@debian:~/testing-perf/N3.js$ uname -a
Linux debian 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux

@jeswr
Copy link
Collaborator Author

jeswr commented May 7, 2022

Also if you could link to the 1_000_000 deep taxonomy once you have this working that would be great @josd .

@josd
Copy link

josd commented May 7, 2022

I started fresh and made a pull request jeswr#10
It now works at lightspeed but the depth of 1000000 is not yet running completely

$ node Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 16.441ms
loading tim berners lee profile card: 16.041ms
apply reasoning: 43.532ms


Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 4.164ms
Reasoning: test-dl-10.n3: 1.595ms
true

Load test-dl-100.n3: 12.537ms
Reasoning: test-dl-100.n3: 1.268ms
true

Load test-dl-1000.n3: 64.785ms
Reasoning: test-dl-1000.n3: 37.498ms
true

Load test-dl-10000.n3: 234.705ms
Reasoning: test-dl-10000.n3: 89.63ms
true

Load test-dl-100000.n3: 2.106s
Reasoning: test-dl-100000.n3: 923.296ms
true


<--- Last few GCs --->

[5276:0x6555770]    28000 ms: Scavenge (reduce) 2046.0 (2079.0) -> 2045.5 (2079.2) MB, 7.7 / 0.0 ms  (average mu = 0.315, current mu = 0.302) allocation failure
[5276:0x6555770]    28067 ms: Scavenge (reduce) 2046.3 (2079.2) -> 2045.8 (2079.7) MB, 11.9 / 0.0 ms  (average mu = 0.315, current mu = 0.302) allocation failure
[5276:0x6555770]    28117 ms: Scavenge (reduce) 2046.5 (2079.7) -> 2046.1 (2079.7) MB, 6.0 / 0.0 ms  (average mu = 0.315, current mu = 0.302) allocation failure


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [node]
 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
 3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xeaf3d5  [node]
 6: 0xeafeb6  [node]
 7: 0xebe3de  [node]
 8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xec1d15 v8::internal::Heap::HandleGCRequest() [node]
10: 0xe4f3b7 v8::internal::StackGuard::HandleInterrupts() [node]
11: 0x11fb775 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x15f0a99  [node]
Aborted

src/N3Store.js Outdated

// return;

let newRules = [];
Copy link
Collaborator Author

@jeswr jeswr May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking @josd - this is where the memory error would be occurring. I can probably tweak the logic here so that rather than iterating over all the rules and then buffering newRules; I do this one rule at a time.

As noted earlier I can't actually download the 1_000_000 benchmark to test this - where do I find it? Thanks for the PR with the 1_000_000 benchmark

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future note for myself, I can also reduce the amount that needs to be stored in the buffer by storing variable substitutions and a pointer to a rule, rather than all the generated conclusions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@jeswr jeswr May 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, it turns out that the problem is not in the length of this buffer, which in tests that I have run appears to not exceed a size of 3 on this benchmark.

In addition, this fault occurs when there are 6-7 million triples in the store, it can definitely handle a lot more, with there being 16.7M triples loaded in in the default tests. I'm very bemused as to where the problem is arising...

@jeswr
Copy link
Collaborator Author

jeswr commented May 9, 2022

I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.

Right, that makes sense.

At some point in time, I would love us to get Comunica to work directly on these internal indexes (comunica/comunica#873), as this would make these kinds of use cases a lot faster.

I've just run some benchmarks (running https://github.com/comunica/comunica-feature-reasoning/tree/chore/benchmark-forward-chaining/packages/actor-rdf-reason-forward-chaining/perf) in Comunica with some of the Asynciterator and N3.js optimisations included and it is 2-3 orders of magnitude slower than the results we are seeing here.

@rubensworks
Copy link
Member

I've just run some benchmarks (running https://github.com/comunica/comunica-feature-reasoning/tree/chore/benchmark-forward-chaining/packages/actor-rdf-reason-forward-chaining/perf) in Comunica with some of the Asynciterator and N3.js optimisations included and it is 2-3 orders of magnitude slower than the results we are seeing here.

Do you know what the cause is? Inefficient code? Or just a different algorithm?

@RubenVerborgh
Copy link
Member

That is absolutely brilliant. This has always been my dream for N3.js when I started it in 2011.

@jeswr
Copy link
Collaborator Author

jeswr commented May 11, 2022

Does this already take into account the asynciterator changes

Yes - that said I was doing it in a very 'hacky' manner of copy/pasting that unreleased locally since I was in a bit of a rush, so I may have missed including a couple of optimizations.

@jacoscaz
Copy link
Contributor

@jeswr amazing work. I understand that the source-agnostic version would benefit from some form of shared dictionary encoding as per recent conversations?

Also, would you be able to profile one of those test runs, both for the Comunica version and the RDF/JS version? I think we might be able to learn a lot from that.

@jeswr
Copy link
Collaborator Author

jeswr commented May 11, 2022

would you be able to profile one of those test runs

Is something like this what you were thinking @jacoscaz (also be aware that the time scale is logarithmic)
all_taxonomies

@jacoscaz
Copy link
Contributor

@jeswr I can't seem to zoom in on that image as it's quite small in resolution but that looks like a comparison of different reasoning solutions. What I'm thinking about is profiling CPU and memory usage across a run using either a sampling profiler (https://nodejs.org/en/docs/guides/simple-profiling/ ) or something more advanced, depending on how comfortable you are around profilers and whether you have used one before. Some IDEs (I tend to use WebStorm) can automatically hook up to V8's profiler if you configure them to do so in their launch configurations.

@jeswr
Copy link
Collaborator Author

jeswr commented May 12, 2022

@RubenVerborgh - I think this is getting close to being ready for review - and whilst I think it can be further cleaned up in the future, I don't know if it is worth putting in the effort to do so until other future optimizations that I have planned are put in.

Before I rebase and mark it as ready for review, there is one remaining hurdle - getting full test coverage in this line. The reason I have included it is because I was following a similar pattern to here. Could you please explain why this condition is required in the latter case since it appears that graphs are removed if they are empty anyway.

I quickly tried running git blame but the most recent change to that line was a variable renaming.

@RubenVerborgh
Copy link
Member

Amazing stuff. That's what we need, and what I've wanted to have since the day I started this.

May I suggest to implement this in a separate class?
You could either extend Store (ReasonedStore or so), or use composition (Reasoner with an embedded store).
I'm suggesting because I think this code will grow quite a bit over time, and it's good for it to have its own file.

Could you please explain why this condition is required

It was probably just a way to get around an overzealous linter not liking for … in unguarded by if 😅

@josd
Copy link

josd commented May 12, 2022

I'm suggesting because I think this code will grow quite a bit over time, and it's good for it to have its own file.

I would also very much think so, e.g. adding to the store (like now and is like forward chaining) or just goal oriented querying (backward chaining) or a "reasoned" combination of those.

@jeswr
Copy link
Collaborator Author

jeswr commented May 12, 2022

May I suggest to implement this in a separate class?

Perhaps this is overcomplicating it - but if I go down that track is it worth trying it implement it as an RDFJSReasoner class - which can reason over any RDF.Store, and perform the indexing optimization here if the input store is an instance of N3Store?

@jeswr jeswr mentioned this pull request May 13, 2022
@jeswr
Copy link
Collaborator Author

jeswr commented May 13, 2022

I've pushed the RDFJS implementation agnostic to https://github.com/jesse-wright-reasoning-honours/rdfjs-reasoner/blob/master/reasoner.ts. This is neither tied to n3 nor Comunica.

Happy to clean up and move this repo to the rdfjs/ organisation - in addition to publishing on NPM if you think it is appropriate.

It might be worth getting some input on the API of it first though. In particular, should it be a class with methods like .load to load rules .set to set the store to apply reasoning to etc, and should we be working towards an interface specification for @rdfjs/types at the same time.

@rubensworks @RubenVerborgh

@RubenVerborgh
Copy link
Member

I've pushed the RDFJS implementation agnostic to https://github.com/jesse-wright-reasoning-honours/rdfjs-reasoner/blob/master/reasoner.ts. This is neither tied to n3 nor Comunica.

Sweet, how's perf? Should we still do #296?
I'm happy to have it within N3.js—or not if you think it's best independent. But probably best to not have to maintain 2.

@jeswr
Copy link
Collaborator Author

jeswr commented May 13, 2022

how's perf

@RubenVerborgh It ranges between 2-4x slower in the RDFJS version

Reasoning over TimBL profile and FOAF
loading foaf ontology: 13.229ms
loading tim berners lee profile card: 7.711ms
Reasoning: 161.886ms

Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 0.404ms
Reasoning: test-dl-100.n3: 3.328ms
Reasoning: test-dl-1000.n3: 57.767ms
Reasoning: test-dl-10000.n3: 298.111ms
Reasoning: test-dl-100000.n3: 2.628s
Reasoning: test-dl-1000000.n3: Out of Memory

Unfortunately there isn't much logic that can be shared either due to the 'pre-compilation' of index lookups that I'm doing in #296

@jacoscaz
Copy link
Contributor

I think having a RDF/JS reasoner that doesn't depend on N3.js would be worth the 2x - 4x loss, given the absolute numbers I'm seeing. Don't get me wrong, N3.js is awesome, but enriching the RDF/JS ecosystem with such a wonderful component would be amazing.

@RubenVerborgh
Copy link
Member

Same—I'd strategically suggest to write it on N3.js now. The burden of proof is on us to show that it can all work fast in the browser. Once we have that, we can generalize. Will follow up on the reasoning PR.

@rubensworks
Copy link
Member

I would also be in favor of focusing on the N3-specific reasoner for now.

I guess once we have an idea of how to tackle #289 (and standardize this at RDF/JS-level), we can port this to an RDF/JS-level reasoner, without any loss in performance.

@jeswr
Copy link
Collaborator Author

jeswr commented May 13, 2022

I would also be in favor of focusing on the N3-specific reasoner for now.

Agreed - in further support of this, I am now getting the logic for reasoning in Comunica closer to that of the generic RDFJS algorithm. The primary bottleneck came down to the unnecessary creation of many iterators in the process - which can be resolved by re-organising some code, and creating some custom iterators for reference these are the variants of iterator usage I have been testing - though I doubt they are of much use to anyone else at the present moment. So an RDFJS implementation separate from Comunica may become somewhat redundant.

@josd
Copy link

josd commented May 13, 2022

@jeswr I really love https://github.com/jeswr/N3.js/blob/feat/N3Reasoner/src/N3Reasoner.js and maybe even more your N3Script https://github.com/jeswr/N3.js/blob/feat/N3Reasoner/test/util.js

The performance is one of the best I have ever seen:

$ node --max-old-space-size=8192 N3Reasoner-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 20.782ms
loading tim berners lee profile card: 16.937ms
Reasoning: 50.257ms

Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 0.265ms
Reasoning: test-dl-100.n3: 1.089ms
Reasoning: test-dl-1000.n3: 17.434ms
Reasoning: test-dl-10000.n3: 109.719ms
Reasoning: test-dl-100000.n3: 1.040s
Reasoning: test-dl-1000000.n3: 12.971s

Thank you so much and Keep Up The Good Work!

@josd
Copy link

josd commented May 13, 2022

Also a very nice performance on a smartphone https://twitter.com/josderoo/status/1525209755471462402 be it that there was not enough memory to run the deep taxonomy of depth 1000000 (subsumption with 3 million classes).

@josd
Copy link

josd commented May 14, 2022

Also have a look at the extended deep taxonomy test which is taking the depth as the number of elements in the classes. It is closer to reality e.g. an ontology with 300000 classes (like SNOMED-CT) and 100000 individuals belonging to those classes (like patients having observations, diseases, treatments, ...).
The modified code is like

$ git diff
diff --git a/test/util.js b/test/util.js
index f9759a9..7e29899 100644
--- a/test/util.js
+++ b/test/util.js
@@ -233,13 +233,15 @@ function load(filename, store) {
 function generateDeepTaxonomy(size) {
   const store = new N3.Store();

-  store.addQuads([
-    new Quad(
-      new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#ind'),
-      new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
-      new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#N0'),
-    ),
-  ]);
+  for (let i = 0; i < size; i++) {
+    store.addQuads([
+      new Quad(
+        new NamedNode(`http://eulersharp.sourceforge.net/2009/12dtb/test#ind${i}`),
+        new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
+        new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#N0'),
+      ),
+    ]);
+  }

   store.addQuads([
     new Quad(

The current results are:

$ node --max-old-space-size=8192 perf/N3Reasoner-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 18.728ms
loading tim berners lee profile card: 15.987ms
Reasoning: 57.744ms

Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 1.16ms
Reasoning: test-dl-100.n3: 133.68ms
Reasoning: test-dl-1000.n3: 13.335s

<--- Last few GCs --->

[3711:0x696f850]   358474 ms: Mark-sweep 7625.1 (8239.9) -> 7614.2 (8230.2) MB, 6144.4 / 0.0 ms  (average mu = 0.140, current mu = 0.048) allocation failure GC in old space requested
[3711:0x696f850]   364806 ms: Mark-sweep 7630.4 (8239.9) -> 7622.2 (8230.4) MB, 6018.9 / 0.0 ms  (average mu = 0.096, current mu = 0.049) allocation failure GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [node]
 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
 3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xeaf3d5  [node]
 6: 0xeafeb6  [node]
 7: 0xebe3de  [node]
 8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xec1d9e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xe83012 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
11: 0xe7d92c v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
12: 0xe7da05 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
13: 0x10db57d v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::New<v8::internal::Isolate>(v8::internal::Isolate*, int, v8::internal::AllocationType, v8::internal::MinimumCapacity) [node]
14: 0x10db7b3 v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::EnsureCapacity<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NumberDictionary>, int, v8::internal::AllocationType) [node]
15: 0x10dbdf4 v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::Dictionary<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::Add<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NumberDictionary>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) [node]
16: 0x1003d48  [node]
17: 0x108a015 v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [node]
18: 0x10cdd6e v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
19: 0x10d2303 v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
20: 0x1205826 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
21: 0x1206da9 v8::internal::Runtime_SetKeyedProperty(int, unsigned long*, v8::internal::Isolate*) [node]
22: 0x15f0a99  [node]
Aborted

So from depth 10000 on it doesn't work anymore.

For EYE and depth 100000 we currently get:

$ swipl -x edtpe.pvm -- --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/
eye/reasoning/edt/test-query.n3
eye --wcache http://josd.github.io/eye/reasoning .. --no-distinct-input http://josd.github.io/eye/reasoning/edt/test-edt.n3 --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/eye/reasoning/edt/test-query.n3
EYE v22.0424.1245 josd
SWI-Prolog version 8.5.11-23-g5f6851705
starting 378 [msec cputime] 376 [msec walltime]
#Processed by EYE v22.0424.1245 josd
#eye --wcache http://josd.github.io/eye/reasoning .. --no-distinct-input http://josd.github.io/eye/reasoning/edt/test-edt.n3 --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/eye/reasoning/edt/test-query.n3

GET http://josd.github.io/eye/reasoning/edt/test-facts.n3 FROM ../edt/test-facts.n3 SC=100000
GET http://josd.github.io/eye/reasoning/edt/test-query.n3 FROM ../edt/test-query.n3 SC=1
networking 2245 [msec cputime] 2840 [msec walltime]
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix : <http://eulersharp.sourceforge.net/2009/12dtb/test#>.

:i100000 a :N100000.

reasoning 482 [msec cputime] 481 [msec walltime]
#2022-05-14T21:57:24.111Z in=400001 out=1 ent=0 step=0 brake=1 inf=33861137 sec=3.105 inf/sec=10905358
#ENDS

2022-05-14T21:57:24.111Z in=400001 out=1 ent=0 step=0 brake=1 inf=33861137 sec=3.105 inf/sec=10905358

@jeswr
Copy link
Collaborator Author

jeswr commented May 15, 2022

For EYE

Is it correct to assume that EYE doesn't suffer from the scaling problem that the N3Reasoner does because it is performing backwards chaining in this case? I definitely think backwards chaining should be added in here at some point meaning that backwards chaining reasoning can be done on calls like has and match

So from depth 10000 on it doesn't work anymore.

This is likely because the array for buffering elements is getting too long by this point. One strategy I am thinking should be implemented to overcome this is to instead have a single s, p, o index in each .next part of a rule containing all of the data that is yet to be put through that rule. I'd imagine we can also re-think the logic here to reduce/remove buffering.

@josd
Copy link

josd commented May 15, 2022

For EYE

Is it correct to assume that EYE doesn't suffer from the scaling problem that the N3Reasoner does because it is performing backwards chaining in this case? I definitely think backwards chaining should be added in here at some point meaning that backwards chaining reasoning can be done on calls like has and match

Very much indeed!
The broader picture is that we need metachaining like https://github.com/josd/eye/tree/master/reasoning/socrates-metachain-f2b or https://github.com/josd/eye/tree/master/reasoning/socrates-metachain-b2f as

There is no principle to tell whether to
use top-down or bottom-up reasoning.

(last quote from https://josd.github.io/quote.html and thanks to Werner Heisenberg)

@jimsmart
Copy link
Contributor

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants