Replies: 9 comments 19 replies
-
Thanks for the detailed description Thomas! In general I support your proposal and I agree that focus is a good thing. Having too many targets is not helpful for reasons you well explain in this post. What you propose is a SPARQL graph database built on a Rust only stack, with a focus on a single-node engine. That is in my opinion a very good focus and would definitely add something to the semantic web stack. There is clearly a need for a SPARQL endpoint that can be compiled and runs on everything from phones to large servers. I would expect it to work well on systems with many cores but from my (limited) understanding of Rust, that's part of the design. I also assume ARM support should not be a problem. You point to an interesting paper for the WASM use-case, so dropping that target (at least for now) is not a problem IMO. Using Sled instead of RocksDB might be a bit of a risk right now but it might also lead to a bigger reward once Sled evolves. I never understood the in-memory use case. In the past 12 years I used that rarely if ever. That makes IMO more sense in environments like node & the browser so dropping that is perfectly fine for me. If someone wants that there are alternatives out there like Jena Fuseki and some others. Distributed storage is again yet another field I would not go into right now. There are surely use-cases but right now they are niche and they can/could be built on existing stores as well. |
Beta Was this translation helpful? Give feedback.
-
Thanks indeed @Tpt for this detailed account. Thanks also for citing out work on WASMTree, but I find you overly generous to qualify it as a "much better approach". At the moment, SPARQL query answering is way much faster with Oxigraph than with our approach. Granted, progress is till possible on our side (especially by integrating more SPARQL processing in the Rust part), but this still requires some significant work... That being said, I understand your will to focus on Sled, and I think it is indeed a good move. Especially, I must say, if that helps making your SPARQL parser and algebra reusable in other projects, such as Sophia 😉. Self-interest aside, I think the two projects can move on in complementary ways: Oxigraph focusing on one specific and optimized implementation, and Sophia providing the generics traits to make it interoperable with other implementations... PS: I feel your pain about generics ;-) |
Beta Was this translation helpful? Give feedback.
-
Thanks @Tpt for pinging me for feedback and for the detailed description of the issues and tradeoffs! I agree with the others that the move to Sled-only is well-motivated and should hopefully pay off as an investment that leads to more growth for Sled. My main concern about Sled is that I have observed it using a fair amount of I/O even when idle, which presents challenges for "embedded" deployments that may use SD cards for storage, but hopefully this is something that becomes addressed as Sled matures. It is also nice to use in-memory storage for temporary deployments or testing, so having the ability for Sled to handle that from the oxigraph frontend (without having to manually create a tmpfs filesystem) is great as well |
Beta Was this translation helpful? Give feedback.
-
@Tpt Thanks for including me in this discussion. I agree with your approach on focusing on one storage system. As others have mentioned already, the only perceivable risk is the maturity level of Sled, but likely this decision will pay off in the future. Similar projects - IndraDB
On their README, they state:
Probably a worthwhile project to keep an eye on to learn from what they find with using Sled in the future. |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for your feedbacks. I have moved forward with my proposal and implemented it in the |
Beta Was this translation helpful? Give feedback.
-
Hi @Tpt! Sorry I missed this notification a long time ago! I see now that you've decided on RocksDB which I totally understand. I do love the romantic notion that we'd have a fully rust-only RDF graph store, but get why you chose RocksDB. I'm still reading up on everything that's happened since the last time I was looking at oxigraph, but would love to contribute and play around with the code some more! I'm still interested in Shex/Shacl parsing and implementation as well. Glad to see that this is still going! |
Beta Was this translation helpful? Give feedback.
-
I am ok with not having a rust-only RDF graph store, but I would like to lend support for the in-memory backend that can be used to run web-based query processing (e.g. https://sparql.gtf.fyi/). This has proven invaluable for the many non-technical people I work with because they don't need to install or run anything --- they just bookmark a page in the browser and can run all the queries they want. Sure, it doesn't scale as much, but it doesn't need to. I am also very interested in SHACL processing w/n the Oxigraph ecosystem. I've done a little work with OWL 2 RL (https://github.com/gtfierro/reasonable) --- if anyone is interested in digging into SHACL, let me know! I'd love to collaborate |
Beta Was this translation helpful? Give feedback.
-
@Tpt In case you haven't seen this yet — it's early days, but after a long time working in private and on the Komora GitHub org, Tyler Neely has started doing sled 1.0 prereleases.
— https://twitter.com/sadisticsystems/status/1684906383227961344
— https://twitter.com/sadisticsystems/status/1685357851210883072 |
Beta Was this translation helpful? Give feedback.
-
I see from the CHANGELOG.md for "[0.3.0-beta.1] - 2022-01-29" that Sled was removed. Reading through this discussion, I am not sure what the final conclusion was. I think a pure-Rust storage implementation is still appealing. Would using Cargo.toml features to include multiple storage backend be feasible? |
Beta Was this translation helpful? Give feedback.
-
TL;DR: I am considering making Sled the only storage backend for Oxigraph.
Currently Oxigraph provides three different storage systems:
Providing three different storages is a challenge for Oxigraph development: often features have to be tweaked three times for the three different stores because of their differences. Having something performant requires the use of complex generics everywhere, cluttering the code and making it much harder to read and write. This significantly slows Oxigraph development speed.
The in memory store is currently very simplistic (global lock, copies on lookups...). The RocksDB store is efficient but very slow to compile (1-2 minutes) and has limited transaction features. The Sled store provides faster reads than RocksDB and competitive writes but is not stable yet and uses more disk space. A benchmark between RocksDB and Sled is provided at the end of this post.
However Sled 1.0 version and storage stability is coming soon, so I believe that making Oxigraph "Sled-only" might be a suitable choice.
Pros:
Cons:
About WASM, it seems that much better approaches than Oxigraph are in progress for browsers and there seems to be people interested in making Sled work with WASI. So, it might be possible to add back NodeJS support in a few months/years.
Querying on distributed storages is very different from the local use-case so it is relevant to move it out of Oxigraph scope. Implementing efficient distributed querying in Oxigraph would mean to completely rework the query evaluation system anyway so it might make sense to leave this work to an other project.
If we go "Sled" only in Rust we should make the SPARQL parser and algebra reusable without dependency on Oxigraph, just like it has been done for the RDF parsers in order to encourage other SPARQL implementations.
To replace the most common
MemoryStore
use-cases, it might be also relevant to provide simple in-memory data structures without SPARQL support for graphs and datasets just like toolkits like RDF4J or Jena.@gtfierro @pchampin @edmondchuc @dougli1sqrd @ktk @dwhitney Sorry for reaching you directly. You have interacted with me on Oxigraph in the past. Do you have any opinion on my proposal or do you see challenges I have not considered?
Benchmark: BSBM explore+update
The dataset used in the following charts is generated with 10k "products" (see its spec). It leads to the creation of 3.5M triples. It has been executed on a PrevailPro P3000 with 32GB of RAM.
The systems compared are the latest version of Oxigraph with RocksDB, Oxigraph with the current in-development version of Sled, Blazegraph 2.1.5 et GraphDB 9.3.3.
The parallelism factor is 5.
Beta Was this translation helpful? Give feedback.
All reactions