You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Existing metadata caching strategy uses more space in /var than it should, leaving both compressed and uncompressed copies of the same session metadata in /var/lib/wyng.
Possible mitigations
Use an aging algorithm to remove manifest and/or manifest.z files from the /var cache on Wyng exit. They can be automatically retrieved and/or decoded as needed. The user may have control over the aging with a --meta-reduce option.
Find alternatives to unix sort --merge tool (see Explanation below)
Explanation
Currently Wyng stores an unencoded copy of each session manifest because unix sort --merge is used to merge them (a merge is required to create a complete picture of the volume for any referenced session except the oldest session). The sort tool is very fast (a requirement here), but does not input directly from compressed files without expensive shell tricks, so the manifests take up extra space in uncompressed form on local disk.
Ideally, there should be a merging tool that could use the encoded/compressed manifests directly, decoding them on-the-fly as needed.
Using heapq and itertools
First experiment replacing sort with this test code executed in the main section of Wyng:
mfnames=(vol.sessions[y].path+"/manifest" for y in (x for x in vol.sesnames))
maplist=[map(str.split, open(x)) for x in mfnames]
for ln in itertools.groupby(heapq.merge(*maplist, key=lambda x: x[1] ), key=lambda y: y[1]):
print(" ".join(tuple(ln[-1])[-1]), file=outf)
With 133 sessions in the volume, this test took 1.93 seconds to run on average, about 4X as long as the existing merge_manifests() routine. The manifest sources were unencoded, so this result doesn't include overhead that would eventually be added. So much for that.
I don't know if the approach I used above could be tweaked or if there are better approaches to handle this in Python.
I'm open to suggestions!
I've already helped reduce the manifest disk usage by doubling the default archive chunksize, reducing the number of manifest entries by half. The compression fs attribute has also been enabled which offers some reduction for /var fs like Btrfs that support it.
Update: The aging feature has also been implemented. Using --meta-reduce=on:0 should lower the /var footprint by about 2/3.
The text was updated successfully, but these errors were encountered:
Looking at the effectiveness of --meta-reduce, I'd consider the basic goal of this issue to be met. Although /var usage still balloons somewhat during runtime, much of that will eventually be moved to /var/cache and so doesn't present much of an issue. Outside runtime, disk usage is now much more compact, with other disused archive dirs no longer retaining their session data for long periods. Finally, Wyng has changed such that metadata is fetched from archives as needed, independent of whether Wyng or user directly culls /var metadata.
Problem
Existing metadata caching strategy uses more space in /var than it should, leaving both compressed and uncompressed copies of the same session metadata in /var/lib/wyng.
Possible mitigations
Use an aging algorithm to remove manifest and/or manifest.z files from the /var cache on Wyng exit. They can be automatically retrieved and/or decoded as needed. The user may have control over the aging with a
--meta-reduce
option.Find alternatives to unix
sort --merge
tool (see Explanation below)Explanation
Currently Wyng stores an unencoded copy of each session manifest because unix
sort --merge
is used to merge them (a merge is required to create a complete picture of the volume for any referenced session except the oldest session). Thesort
tool is very fast (a requirement here), but does not input directly from compressed files without expensive shell tricks, so the manifests take up extra space in uncompressed form on local disk.Ideally, there should be a merging tool that could use the encoded/compressed manifests directly, decoding them on-the-fly as needed.
Using heapq and itertools
First experiment replacing
sort
with this test code executed in the main section of Wyng:With 133 sessions in the volume, this test took 1.93 seconds to run on average, about 4X as long as the existing
merge_manifests()
routine. The manifest sources were unencoded, so this result doesn't include overhead that would eventually be added. So much for that.I don't know if the approach I used above could be tweaked or if there are better approaches to handle this in Python.
I'm open to suggestions!
I've already helped reduce the manifest disk usage by doubling the default archive chunksize, reducing the number of manifest entries by half. The compression fs attribute has also been enabled which offers some reduction for /var fs like Btrfs that support it.
Update: The aging feature has also been implemented. Using
--meta-reduce=on:0
should lower the /var footprint by about 2/3.The text was updated successfully, but these errors were encountered: