-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: lvcreate error 5 when sending #184
Comments
Re the last line; the .tock volume is neither in the new archive or in /dev/qubes_dom0 |
Investigating further:
After
Then:
So I run
Then I try:
|
I had to
So once a qube is backed up into an archive through wyng-util-qubes it can't be backedup to another archive? And if that archive is deleted? |
After removing metadata files from /var/lib/wyng and re-initiating the remote archive:
As a very separate sidenote, using wyng-util-qubes I cannot pipe the password file or include in passcmd= as there is still a second prompt for wyng (seen above) |
Last post, apologies; wanted to investigate before calling it a day. I duplicated debian-12-minimal, and into the same archive I created a backup.
Perhaps this is user error, but some concerns I have that hopefully you might address:
As mentioned in our other thread, I really appreciate all your work on this and look forward to the full release but will keep working with it in the meanwhile and will help to improve however I can. |
@kennethrrosen OK, that's a lot of good feedback and investigation. Thanks! For starters, I think Thin LVM had an internal hiccup and lost track of which volumes were available. Sometimes bringing the volgroup offline then online, or rebooting can resolve this. If there is an avoidable cause of the LVM problem, apart from Wyng snapshots on top of Qubes snapshots adding a bit more stress to Thin LVM, the answer to this is almost always to increase the qubes-dom0/vm-pool metadata size with On remap: That can also be passed to Wyng as an option with On passphrase prompts: You can pass the --passcmd to Wyng with The 'atexit callback' has been fixed in the 08wip branch. More in next comment... |
Should be addressed above. Wyng is a snapshot manager, and a snapshot must be associated with an archive (actually a point in time in an archive). But LVM is ill-suited to juggling multiple snapshots per volume. I find that most users never encounter remapping issues because they tend to use one archive per system.
Yes, wyng-util-qubes will create qubes (I hate saying that; confusion with the OS name) AKA VMs as necessary and restore their settings. It stores the Qubes XML metadata in the archive and retrieves it much the same way Qubes' own In rare cases, conflicts may arise because you currently have a special Standalone VM named 'sys-net' where the archive contains a regular appVM of that same name (the classification of the existing VM can't be changed). This would require manual intervention, such as renaming the Standalone VM to something else and re-doing the restore.
For non-dom0 VMs, nothing more than a bit of care is needed. However you need to supply/restore the templates and netVMs that those VMs rely on. IOW, its best to restore templates and netVMs (like sys-net) first if that's necessary (OTOH, the default installed templates and/or netVMs may be all that your other VMs need to function). The util will restore templates first to help avoid issues with this. But if your ducks aren't lined up at restore time then you may need to try more than once (this is another reason to be cautious with multiple archives... you may put necessary templates or netVMs in an archive that you forgot about). I sometimes get asked about directly restoring dom0 itself. That is a can of worms (and not directly supported) but it doesn't mean you can't backup the dom0 root with Wyng (not the util) if its sitting in a thin LVM volume.
FWIW, I just tried successfully with the following pipe using Wyng rel 20240411:
You were using Wyng rel 20240403 and probably didn't specify --authmin, but that Wyng version defaults authmin to '0'. The util needs to run Wyng multiple times for certain ops, so it uses the passphrase at first and then the next run sees no key agent so it prompts you instead. (So... use the newer Wyng release ;) ) This also worked:
|
Thank you, @tasket! I've been testing backups with throwaway VMs over the last few hours and am much less afraid that I was during that initial period where the lvm pool wigged out over a dearth of metadata space. For the script I've established to run wyng-util-qubes intermittently, I'm looking at the wyng.ini file for authmin and have a var for the passphrase now. dom0 is out of the scope of what I was hoping to do. And I haven't yet run into issues with dependencies re templates or netvms; my use case is a very minimal system that's transportable and quickly restored in case of total loss. This has all been very helpful. I think we can consider this closed, but if anything further arises I'll place here or in the forum. |
Considering the QubesOS default is LVM, these are the parameters I'm working under. @tasket, having now been using the tool properly for a few days, what I wondered about remapping, understanding that it threatens the inherent speed of the backups, was whether using remap often would be damaging to the snapshot or the archive itself. In the documentation you note that it's possible to backup and backup: duplicate the archive in another location, remote or local, for redundancy's sake. Is that faster and more efficient than having wyng backup to multiple archives (one local, one remote)? I apologize if all of this would simply be negated were I to reinstall Qubes under Brtfs; perhaps when I have a day to setaside in a month or two. Edit: also keeping an eye on this. QubesOS/qubes-issues#6476 (comment) |
With versions
|
@kennethrrosen I've never encountered this error being triggered before. Its likely that something is mangling the data. Just to be sure, try using the updated Wyng 08wip version I just posted. It will provide more explicit details. |
|
|
@kennethrrosen No need to answer the questions in the last comment. I can see a code path that would produce the error when a metadata file is especially small and also uncompressible. The size check doesn't have enough margin to allow for that. I've added the necessary margin to avoid that error condition; see today's update to 08wip branch. |
Also, I was just able to reproduce the error (without the modification) by adding a very small 'volume' to the archive, then deleting the local metadata and then doing a verify. |
Thank you. wip seems to work now, and am getting these alerts before the snapshots are sent:
Was it something I may have done in the remote directory? Though, really, I've done nothing since the initial issue was resolved and had the backups running on a bi-weekly script. |
This could be triggered by deleting /var/lib/wyng metadata (although it shouldn't persist if you stop deleting it). Is it happening for all volumes? I was experiencing a similar issue with LVM volumes recently (without deleting metadata); IIRC I applied a fix but I'll have to revisit that code to see what is going on. I would like to reach a point where full scans aren't usually necessary even when /var/lib/wyng is deleted. |
I have not touched the metadata since it began backing up without issue; though it seems like it has selectively skipped the -root but not the -private volume for one of the VMs. |
It will always skip -root for appVMs, since those only borrow their root volumes. You would expect to see -root vols for template VMs. |
Test volname uniqueness Add prog version to json output Update Readme
The following occurs at the end of a backup. One Standalone in particular is now not backing up -- seems the process just hangs at the private volume backup.
|
@kennethrrosen Probably due to a communication error with remote? Unless the URL is of the file: or qubes: type. Is the standalone backup attempt initial/full, or incremental? You can run the util with |
PS - If the dest URL is one of the 'ssh' types, then the wyng-rpc log will be on the remote system, not the VM. |
@tasket it does seem to have been a network-related issue, as when I switched to a LAN connection it eventually finished; this was a full dedupe and the private volume had changed by 20010.5MB so it was a rather significant. Separately, but something unaddressed from earlier: is it not advisable to --remap for every sync? For instance, when I'm traveling, with poor internet, it doesn't make sense to backup remotely, so I'll backup to an offline VM. When I reconnect to a stable connection, that VM then syncs to the remote server. But if I have two separate archives (one on the VM and one on the remote client) I'd have to remap each time. Is there a way to navigate this as yet? |
FWIW, there is no risk to using As for retaining a high degree of efficiency... I'm assuming your two archives have somewhat different VM selections because of local space or other issues? If not, and the two have essentially the same VMs, you could do an One thing I've done to keep my dom0 home & config backed up is a script that sends /home and /etc to an offline VM, using Here is the current script that would need adaptation:
Another idea is to simply do A related idea is to use Qubes' revisions_to_keep setting for VM volumes. I don't know how high this setting can go ('2' is the highest I've seen) but it might form the basis for easily replaying a VM through a history of changes while backing up each progression. The latter two ideas of course put a strain on LVM metadata resources, and they don't provide compression. (I should probably repeat that this conundrum essentially goes away on Btrfs, where Wyng is simply keeping separate snapshots for each archive.) |
@tasket When I recently attempted to restore a single VM (no session) I receive the prompt "VMs selected [VMNAME]" and then wyng exits. (Response to your previous comment once I've sorted the restore.) As a separate piece of feedback, whenever I stop wyng I can't run it again without restarting dom0, because it says wyng is already running and Here is the error log output:
|
@kennethrrosen What does |
@kennethrrosen There may be a bug in the way the util is filtering root vs private volumes for each VM. What is the VM type, and if its template or standalone, what is the session overlap between the two lists (assuming you're using LVM storage):
|
This VM (I have not tried in this session to restore any others) is a disposable template AppVM.
|
@tasket if I do a fresh install (the reason for my testing the backups beforehand) and repartition with btrfs, will the wyng archive, or directories in dom0, need modification? I read in the forum that one would need to create a subvolume, like so:
|
@kennethrrosen I need to do more testing with disp templates; I've only ever backed these up once or twice. You can force the util to include them with the
Yes, I need to put this in the Readme. The --local spec has to point directly to a subvolume (even if its the primary/default one). The reason is that subvol transaction activity must be checked just before and after CoW metadata is acquired. Also, once you have restored to Btrfs, the volume names will no longer match the archive vol names. The util does not yet offer to rename the volumes (something I'm working on), so your best approach for resuming backup procedures in the same archive is to turn on deduplication; this will avoid re-sending all the current data to the archive. Of course, you could manually rename the volumes instead. |
@tasket below are the commands I use in a script on my lvm machine, for backup and restore respectively:
How would I modify these, assuming I'm first backing up from the lvm machine, and then restoring to a btrfs machine (assuming also that I've already completed the subvolume change)? |
@kennethrrosen The thing I'd recommend for Your backup routine already uses deduplication, so that's good. In a couple of days I should have the re-naming code done and tested, in which case dedup won't be necessary. Also, I didn't mean to imply you should be using IIRC, when you use the Qubes Btrfs installer default it makes everything one big filesystem and the Qubes pool it creates will be the system default where newly created (restored) VMs will go. After you finish installing, look at the disk widget in the systray: you should see one entry for the kernels and one called 'varlibqubes' with most of the disk space you allocated. The util restores to the right pool automatically (i.e. the system default pool) unless you have a custom partition scheme, in which case you might have more than one Qubes pool (in my case I had to use |
@tasket I will make that change to the restore command. Thanks! Is
Then creating a subvolume isn't necessary if the util will manage it independenly?
This might be the best solution, as my script already rsyncs from other vms to the local offline backup, and I likewise tar dom0's /home /etc and /srv directories. I very much appreciate you providing your script and also the continued assistance. I'll await the re-naming code update before once more test the restore of the disposable template VM and migrating to btrfs. |
@kennethrrosen That Also – I think I just re-created the restore bug you were experiencing: The restore simply stops without reporting any details and exits silently with return code 2. This is a bug in Wyng not handling the
That would be sort of like trying to re-create an LVM volgroup from a backup. So the answer is 'No'. However, I could see including a manually-invoked option for this somewhere. The Btrfs subvol is not a part of Qubes' concept of 'reflink storage pool' (although it might be someday) so having a Qubes pool defined doesn't bestow any subvols on us. I might in future allow operation without a specific subvol, if some people want the option of flying without it... however there are many situations where that would cause Wyng to stop and say "I can't process this fs metadata".
I'll try to have it all better-tested tomorrow and hopefully ready for a new beta release mid-week. |
These issues all seem to be addressed so I'm closing this one out. Thanks for the feedback! |
@tasket I've just got around to testing, and I get the following error before wyng exists.
|
The same error appears when using the main branches, too:
|
@kennethrrosen The package to install in dom0 is 'python3-zstd' like so:
|
@tasket, encountered on restore:
|
@kennethrrosen This is the immediate error:
You could check to see if this data chunk really is 131108 bytes on disk. In order to find the path, just If you run the restore with |
@kennethrrosen I've updated the guidance for safely making duplicate archives. See https://github.com/tasket/wyng-backup#tips--caveats |
@tasket I haven't made a duplicate archive in this case. It is still the lvm system pulling from the arch I inited from the outset. Very great to see the duplicate archive tips section; I will post the error log shortly |
The data chunk is indeed this size. Here is the output from
|
@kennethrrosen Nothing to go on in that log. There could be something in the receive.log that's located on the remote system under /tmp/wyng-rpc/xxxx where xxxx is the latest tmp subdir (random). Running with The circumstances, receiving a partial message before terminating, suggest that something in the VM or remote environment could be putting ssh into a 'pty' mode which reacts to escape sequences (thus interpreting part of your data as an escape and closing the pipe). I pushed a test version in the debug branch with ssh options An avenue for further troubleshooting would be to make a copy of that chunk, then replace it with a uniform (say, all |
@kennethrrosen I forgot to mention there is another way to test and confirm/eliminate a qvm-run or ssh connection issue, which is to run Wyng on the remote system itself (pointing to a 'file:/' URL) with a command like |
@kennethrrosen I sent you an email from my protonmail account. |
Wyng version 0.8beta 20240411 on Qubes 4.x
When preparing lvm snapshots for a full scan, the
lvm lvcreate
command exits with rc 5.See report in forum message: https://forum.qubes-os.org/t/re-ann-wyng-incremental-backup-new-version/25801
Troubleshooting
Try adding
-w debug
to thewyng-util-qubes
command line and then posting the error log contents:sudo less /tmp/wyng-debug/err.log
. It should be possible to copy/paste the text by clicking on the clipboard widget ("Copy dom0 clipboard").The text was updated successfully, but these errors were encountered: