Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializing large lists is slow #22

Open
ArthurWD opened this issue May 2, 2019 · 2 comments
Open

Serializing large lists is slow #22

ArthurWD opened this issue May 2, 2019 · 2 comments

Comments

@ArthurWD
Copy link
Contributor

ArthurWD commented May 2, 2019

Dumping this list in ntriples almost instantly finishes: RDF::List(*(0...100)).dump(:ntriples).

While RDF::List(*(0...100)).dump(:n3) is very slow.

My previous PR #21 improved the performance a bit, but later commits reduced the performance again.

Running Benchmark.measure { RDF::List(*(0...100)).dump(:n3) }.to_s on different commits:
281f707 => 0.477319 0.024164 0.501483 (0.611887)
532485c => 2.693448 0.048810 2.742258 (3.432530)
f2938bc => 5.003711 0.026643 5.030354 (6.697221)

The more items in a list, the slower the serialization.

Do you have any ideas on how to improve the performance of large lists?

@gkellogg
Copy link
Member

gkellogg commented May 2, 2019

Well, the writer did get a lot of work to be able to write out full N3 Formulae, vs. just Turtle, but most of that shouldn't have come into play. My suspicion is in this block:

rdf-n3/lib/rdf/n3/writer.rb

Lines 370 to 376 in 234f7b2

list_elements = @lists.values.map(&:to_a).flatten.select(&:node?).compact
# Sort subjects by resources over bnodes, ref_counts and the subject URI itself
recursable = (@subjects.keys - list_elements).
select {|s| !seen.include?(s)}.
map {|r| [r.node? ? 1 : 0, ref_count(r), r]}.
sort

I'll look into it later this week.

@gkellogg
Copy link
Member

gkellogg commented May 3, 2019

Note that RDF::List(*(0...100)).dump(:ttl) is fairly slow too, but not quite as slow as :n3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants