Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshots: Optionally inform about creating internal snapshots #1554

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mvollmer
Copy link
Member

@mvollmer mvollmer commented Apr 12, 2024

This is based on and meant to go together with the RHEL product documentation.

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/creating-virtual-machine-snapshots_configuring-and-managing-virtualization

That documentation explicitly calls out internal snapshots as deprecated and strongly advises against using them in production environments.

However, it also says that "Internal snapshots might work for your use case, but Red Hat does not provide full testing and support for them. ", external snapshots don't work in all cases, and upstream libvirt does not consider them deprecated so Cockpit should probably not fully prevent their use on RHEL. (And there are no requests to us for preventing them.)

Thus, let's have a little notice in the creation dialog with a link to the product documentation.

image

@mvollmer
Copy link
Member Author

I have to better understand the motivation for this. Maybe we should actively warn people when they are about to create a internal snapshot. Maybe we should put a short and neutral explainer text in the dialog instead of just one word in the title.

@martinpitt
Copy link
Member

I'm afraid I can't help you with the motivation either -- this feels like an internal implementation detail which users really shouldn't need to care about. But putting that into the title WFM -- at least this is nice as a discussion ground with the libvirt folks. Thanks!

@mvollmer
Copy link
Member Author

I'm afraid I can't help you with the motivation either -- this feels like an internal implementation detail which users really shouldn't need to care about.

Internal details matter when debugging issues, and this is an easy way to tell your support contact on IRC what is going on, even if you yourself have no idea what it means.

@mvollmer mvollmer marked this pull request as ready for review April 16, 2024 08:14
@mvollmer
Copy link
Member Author

I have to better understand the motivation for this. Maybe we should actively warn people when they are about to create a internal snapshot. Maybe we should put a short and neutral explainer text in the dialog instead of just one word in the title.

The motivation seems to be to just provide information which kind of snapshot Cockpit is making, to help with support cases.

I think putting it in the title might actually be the least distracting and most clearest way to do this. Putting a paragraph of text in the dialog body calls too much attention to this detail. Adding a warning against internal snapshots is not warranted; they are just fine in the cases Cockpit uses them.

An alternative would be to not say anything during creation of snapshots, but to list the type of snapshots in the snapshot card.

@garrett
Copy link
Member

garrett commented Apr 17, 2024

this feels like an internal implementation detail which users really shouldn't need to care about

Yep.

@mvollmer
Copy link
Member Author

this feels like an internal implementation detail which users really shouldn't need to care about

Yep.

Agreed. I don't like changing the dialog either. The user might not understand why that word is in the title, and if they do and they want to have the other type of snapshots, Cockpit wont give it to them anyway. The title of the dialog needs to repeat the title of the button.

Instead, let's just put a new entry into "Hypervisor details":

image

@garrett, wdyt?

@martinpitt
Copy link
Member

@mvollmer That's too coarse-grained IMHO. You can very well have internal and external snapshots on the same VM. Not necessarily created by cockpit, but the CLI does allow it. I liked your idea about showing an internal/external label in the snapshots table -- that feels like the right choice wrt. how the API works, that you don't have a choice about it in cockpit, and conveying the information.

@garrett
Copy link
Member

garrett commented Apr 17, 2024

Snapshot type internal doesn't tell me anything useful though. What does that mean? What is the alternative? How do you change it? Why would you change it? Where is "internal" stored? Etc.

(I'd assume it stores the snapshot next to the disk image, wherever that is. Which is probably what people would want a vast majority of the time?)

This really seems like it's an unnecessary implementation detail. Please feel free to tell me otherwise, though.

@mvollmer
Copy link
Member Author

mvollmer commented Apr 17, 2024

I liked your idea about showing an internal/external label in the snapshots table -- that feels like the right choice wrt. how the API works, that you don't have a choice about it in cockpit, and conveying the information.

However, that only tells people what they have done after the fact.

@martinpitt
Copy link
Member

However, that only tells people what they have done after the fact.

Yes, IMHO that's fine. As far as I remember, the libvirt team doesn't even want us to give a choice about internal/external in the creation dialog, so there's nothing to decide for the user anyway? If we offer that choice, then it needs to go into the dialog, of course.

But in no case is this a hypervisor or VM property.

@mvollmer
Copy link
Member Author

So, going back to the original feature request, I think the best would be to precisely detect when people are about to do something unsupported, and warn them about it.

In this case this would be creating an internal snapshot of a running machine on RHEL. So maybe we do just that.

(I didn't even know until a minute ago that internal snapshots also store RAM... )

@mvollmer
Copy link
Member Author

But in no case is this a hypervisor or VM property.

Well, I could argue that it's a derived property and it means "the type of snapshots that cockpit will create", and that is solely determined by the VM (but might change over time).

But I wont. :-)

@martinpitt
Copy link
Member

and warn them about it. In this case this would be creating an internal snapshot of a running machine on RHEL.

IMHO this would be a terrible UX. Cockpit should not create snapshots which the API doesn't offer (and our current feature detection should be quite adequate -- may have bugs, of course). I.e. on current RHEL it should always create an external snapshot, with RAM (if the VM is running) or without RAM (if it's off).

It could of course be that the API offers functionality which the libvirt team doesn't want to support. IMHO that's just lying to ourselves, and then I'd rather not offer the functionality at all than hiding behind some "but I told you it may break" sign. Users will quickly learn to ignore these anyway, and they are not at all helpful. It's our responsibility as OS builders to define functionality and make sure it effing works.

@mvollmer
Copy link
Member Author

I.e. on current RHEL it should always create an external snapshot, with RAM (if the VM is running) or without RAM (if it's off).

There are cases in vmSupportsExternalSnapshots that look like they might happen on RHEL:

    // If at leat one disk has internal snapshot preference specified, use internal snapshot for all disk,
    // as mixing internal and external is not allowed
    const disks = Object.values(vm.disks);
    if (disks.some(disk => disk.snapshot === "internal")) {
        logDebug(`vmSupportsExternalSnapshots: vm ${vm.name} has internal snapshot preference specified`);
        return false;
    }

    // Currently external snapshots works only for disks where we can specify a local path of a newly created disk snapshot file
    // This rules out all disks which are not file-based, or pool-based where pool is of type "dir"
    // or media-less disks
    const supportedPools = storagePools.filter(pool => pool.type === "dir" && pool.connectionName === vm.connectionName);
    if (!disks.every(disk =>
        (disk.type === "file" && disk.source?.file) ||
        (disk.type === "volume" && supportedPools.some(pool => pool.name === disk.source.pool))
    )) {
        logDebug(`vmSupportsExternalSnapshots: vm ${vm.name} has unsupported disk type`);
        return false;
    }

@martinpitt
Copy link
Member

@mvollmer Ah right, I forgot about this hilarious hackery. As soon as you attach e.g. an empty CD drive or other devices to the VM, external snapshots don't work any more. I don't know whether this is by design, or bug workarounds -- these are definitively good matter to discuss with libvirt folks.

@mvollmer
Copy link
Member Author

IMHO this would be a terrible UX.

Yeah, and also bound to be incorrect as the supported/not-supported status of things change. I have underestimated the complexity enterprises can create re the "is it supported?" questions...

@mvollmer
Copy link
Member Author

This is cursed...

@mvollmer
Copy link
Member Author

mvollmer commented Apr 22, 2024

Ok, my understanding has evolved a bit... :-)

The upcoming documentation for RHEL 9.4 will talk extensively about the preconditions for creating supported snapshots: The VM must only have disks of type "file". The documentation also uses the terms "external" and "internal" to distinguish between supported snapshots and "deprecated snapshots that might work for you but are not fully tested or supported". It does not explain the technical differences or go into any further detail.

(I don't know I can quote draft docs here. Ask me for a link if you want to read the details yourself.)

So let's just expose this in Cockpit: If we make a internal snapshot because there is a disk that is not of type "file", we put a paragraph into the dialog that says this and has a link to the "Support limitations for virtual machine snapshots" chapter of the product documentation. We only do this on RHEL.

The text of the paragraph could be

" The snapshot will be created in the "internal" format because
this virtual machine uses disks with a source type other than "File".
This format is not fully tested or supported. See [the product documentation]
for further details. "

I wouldn't even render it as a warning, just as info.

The product documentation also has awkward instructions for checking whether Cockpit has created a external or internal snapshot, using virsh and grep on the command line.
To help with this, we could add a little (i) icon somewhere in our table (on RHEL only) that explains the support status of a snapshot.

@mvollmer
Copy link
Member Author

The other scenario where Cockpit creates internal snapshots is when libvirt doesn't implement them at all. We don't need to tell people about that.

@martinpitt
Copy link
Member

The VM must only have disks of type "file".

Interesting -- that seems much more restrictive than even what we have now -- e.g. drives without media (empty CD-ROM) would then also not be supported?

The snapshot will be created in the "internal" format because this virtual machine uses disks with a source type other than "File". This format is not fully tested or supported.

If there's lots of pressure, I can live with showing that in the dialog, but it's still nonsense IMHO -- what should the user do? My strong preference for this situation is then to just not offer snapshotting at all. Again we as OS builders have to define which functionality we support and which not -- we can't wash our hands and leave the user with a broken situation.

The other scenario where Cockpit creates internal snapshots is when libvirt doesn't implement them at all

"Them" == "external snapshots", right?

@mvollmer
Copy link
Member Author

The other scenario where Cockpit creates internal snapshots is when libvirt doesn't implement them at all

"Them" == "external snapshots", right?

Ah, yes, sorry. " when libvirt doesn't implement external snapshots at all".

@mvollmer
Copy link
Member Author

The VM must only have disks of type "file".

Interesting -- that seems much more restrictive than even what we have now -- e.g. drives without media (empty CD-ROM) would then also not be supported?

Those are technically of type "file", so they are supported.

@mvollmer
Copy link
Member Author

If there's lots of pressure, I can live with showing that in the dialog, but it's still nonsense IMHO -- what should the user do?

Yeah, hmm, well. Creating internal snapshots is just fine, actually, as far as I know. It's just that the documentation has important words about them, and in my opinion, Cockpit should try hard to make consulting the documentation unnecessary, so we should show those important words in the dialog. Also we have been asked to do this.

The documentation says "they might work for you", so totally preventing people from making internal snapshots also seems wrong.

And note that this is only for RHEL.

@mvollmer
Copy link
Member Author

(/me mumbles ... they should've just fixed whatever is wrong with internal snapshots and not tell anyone about it.)

@mvollmer mvollmer changed the title snapshots: Announce which kind of snapshot is going to be made snapshots: Optionally inform about creating internal snapshots Apr 22, 2024
@mvollmer
Copy link
Member Author

The product documentation also has awkward instructions for checking whether Cockpit has created a external or internal snapshot, using virsh and grep on the command line.
To help with this, we could add a little (i) icon somewhere in our table (on RHEL only) that explains the support status of a snapshot.

I think this is not necessary. Existing internal snapshots are just fine, we only need to worry about informing people when they create more.

@mvollmer
Copy link
Member Author

mvollmer commented May 7, 2024

@garrett, the dialog now looks like this:

image

@mvollmer
Copy link
Member Author

mvollmer commented May 7, 2024

Personally, I think this is a reasonable thing for Cockpit to do, given the current limitations of external snpahots and RHEL's unwillingness to fully support internal ones.

Alternatives:

  • Get RHEL to fully support internal snapshots. I still don't fully understand why they are deprecated. It might be that someone simply decided to stop investing in them to "simplify" but I don't think upstream libvirt will ever deprecate or remove them. In fact, upstream libvirt still defaults to internal snapshot for compatibility reasons.

  • Improve the implementation of external snapshots to also work in file-based pools. I'll see if I can find the paper work that tracks this.

  • Don't say anything in the Cockpit UI and let people find the product documentation themselves and run the command line snippets in it to figure out whether they are about to do something that is not fully supported.

  • Refuse to make internal snapshots on RHEL in Cockpit. But the error message would need to contain pretty much the same text as the proposed dialog here in order to be helpful. We can of course also refuse to be helpful and just disable the button without explanation.

The snapshot dialog will get more complicated in the future (with "disk-only" option etc.), but whatever we figure out here will be applicable then. Thus, I would like to make progress here.

@mac2net
Copy link

mac2net commented May 7, 2024

I started this thread – after trying the snapshots out and experiencing some confusion even though I was following this thread all along.

My suggestion is to add as much documentation in the interface as possible. It's a new feature and an important one.

90% of the time less is more, but not this time IMO.

Comment on lines +198 to +202
if (!isExternal && preferExternalDocURL) {
const link = (
<a target="blank" rel="noopener noreferrer" href={preferExternalDocURL}>
{_("the product documentation")}
</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 5 added lines are not executed by any test.

Comment on lines +205 to +206
if (noExternalReason == "non-file-volume") {
prefer_external_info = fmt_to_fragments(_("This snapshot will be created in the deprecated \"internal\" format because some of the disks have a source that is not of type \"File\". See $0 for more information about this format and its limitations."), link);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 added lines are not executed by any test.

Comment on lines +216 to +220
<HelperText>
<HelperTextItem>
{prefer_external_info}
</HelperTextItem>
</HelperText>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 5 added lines are not executed by any test.

Comment on lines +890 to +891
logDebug(`vmNoExternalSnapshotsReason: vm ${vm.name} has no external snapshot support`);
return "not-implemented";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 added lines are not executed by any test.

Comment on lines +898 to +899
logDebug(`vmNoExternalSnapshotsReason: vm ${vm.name} has internal snapshot preference specified`);
return "explicit-internal";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 added lines are not executed by any test.

@mvollmer
Copy link
Member Author

mvollmer commented May 7, 2024

My suggestion is to add as much documentation in the interface as possible. It's a new feature and an important one.

Hmm, there isn't much new here, is there? Cockpit has been able to make snapshots of virtual machines since version 225, released in 2020.

What has happened is that RHEL wants to move away from one snapshot format to another. Ideally, this is a implementation detail and Cockpit will just make snapshots in the newer format, with no need to tell the user about it. And we don't want to explain in the UI the difference between "internal" and "external" snapshots, if at all possible.

But RHEL really really wants people to stop using the old format, before the new one can handle all cases that the old one could. So there are cases where we must decide whether to make a internal snapshot, or refuse to make any. (And I am still of the opinion that there is nothing bleeping wrong with the old snapshot format, so it would be a bit of a shame to prevent people from getting their job done with Cockpit.)

Fedora does not want people to stop using the old format as badly, afaik, so Cockpit will not show this message there.

@mvollmer mvollmer added blocked and removed blocked labels May 13, 2024
@mvollmer
Copy link
Member Author

mvollmer commented May 13, 2024

The RHEL product documentation has been published. I have updated the description of this PR with my best shot at selling it:

This is based on and meant to go together with the RHEL product documentation.

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/creating-virtual-machine-snapshots_configuring-and-managing-virtualization

That documentation explicitly calls out internal snapshots as deprecated and strongly advises against using them in production environments.

However, it also says that "Internal snapshots might work for your use case, but Red Hat does not provide full testing and support for them. ", external snapshots don't work in all cases, and upstream libvirt does not consider them deprecated so Cockpit should probably not fully prevent their use on RHEL.

Thus, let's have a little notice in the creation dialog with a link to the product documentation.

@mvollmer
Copy link
Member Author

Afaics, external snapshots will eventually be able to make snapshots also for disks in certain pools, and then Cockpit can be changed to use them always. One issue that tracks this: https://gitlab.com/libvirt/libvirt/-/issues/631

We could already now use them always and just let people run into the libvirt error messages...

@mvollmer
Copy link
Member Author

For the record, this is the six year history of trying to get rid of internal snapshots: https://bugzilla.redhat.com/show_bug.cgi?id=1621944

@garrett
Copy link
Member

garrett commented May 14, 2024

Sorry... I've been out on PTO over the past few weeks (May in Europe, right?) and have drifted in and out of this ongoing conversation. So I don't remember all the details, as I've been multitasking not just between work and non-work but also between all the different tasks from both teams. We really need to sync up on snapshotting, probably in a call soon.

Meanwhile, I have some questions and comments:

Where are the internal snapshots deprecated? When will they be removed? Is this a RHEL only thing or are they still supported outside of RHEL? If someone creates a snapshot with the deprecated format, what happens when support is removed?

(Deprecated within software specifically means to avoid and find a path off of it, as it will be removed in a future version. If it's not going to be removed, then it's "not supported", not "deprecated".)


Note: External snapshots are surprising, dangerous, and cannot be undone. They tie the storage of a VM to the external snapshots forever, where all files have to be associated with each other and backed up together. The disk is no longer simply the disk as one might expect, but it's only disk's state before the snapshots occured, and everything after lives on in different files elsewhere on storage. This would most definitely be very surprising to anyone who is backing up VMs.

As a result, we most definitely should warn about this and try to figure out if there's some way to be able to undo it later, even if that's Cockpit-specific for the time being. (And push for upstream to fix this in their tooling.)

So, while this issue is about a warning for internal snapshots, I think we really more critically need a warning on external snapshots, even if they're supported and internal snapshots are not "supported" (at least in RHEL). Of course, we need to decide what to do for internal snapshots too.


As soon as you attach e.g. an empty CD drive or other devices to the VM, external snapshots don't work any more.

Is that pre-snapshotting or after you make a snapshot? Or both? Does this mean once you enable snapshots you cannot add an ISO or any devices?

Or does it mean any VMs with a CD drive or other devices cannot be snapshotted? Can you remove the devices and snapshot? Can we do this in the snapshot dialog?

As this is about external snapshotting, I guess we should solve all of that in an external snapshotting issue/PR instead of here?

Although, we need to figure out snapshotting overall.


Also, we probably need to keep supporting the snapshotting that has already been done within Cockpit indefinitely in some form, even if not deemed officially "supported".

And then, if someone is using an unofficial snapshot type (internal), then we need to provide a means to convert that to the other type (external) somehow. There needs to be some kind of migration path other than "delete this VM and create a new VM".

@mvollmer
Copy link
Member Author

Where are the internal snapshots deprecated?

Since RHEL 8, afaik. They used to be supported in RHEL 6 and 7.

When will they be removed?

I think the current plan is to remove the ability to create internal snapshots in RHEL 10. But this has been the plan for six years now. I wouldn't bet on it actually happening in RHEL 10.

Is this a RHEL only thing or are they still supported outside of RHEL?

I don't know whether for example, Oracle or Canonical offer paid support for them. Upstream libvirt and qemu still support internal snapshots in the sense that they will accept bugs and patches for them. But they will likely not receive feature development.

If someone creates a snapshot with the deprecated format, what happens when support is removed?

I don't know... if the actual code is removed, then you can no longer delete or revert internal snapshots, and you have to figure out what to do. Probably throw away the VM and make a new one.

Note: External snapshots are surprising, dangerous, and cannot be undone.

All the effects of external snapshots disappear when the last one is deleted.

They tie the storage of a VM to the external snapshots forever, where all files have to be associated with each other and backed up together. The disk is no longer simply the disk as one might expect, but it's only disk's state before the snapshots occured, and everything after lives on in different files elsewhere on storage. This would most definitely be very surprising to anyone who is backing up VMs.

Yes, external snapshots increase the complexity quite a bit.

As a result, we most definitely should warn about this and try to figure out if there's some way to be able to undo it later, even if that's Cockpit-specific for the time being. (And push for upstream to fix this in their tooling.)

You can delete all external snapshots and then you are back to a simple disk configuration.

So, while this issue is about a warning for internal snapshots, I think we really more critically need a warning on external snapshots, even if they're supported and internal snapshots are not "supported" (at least in RHEL). Of course, we need to decide what to do for internal snapshots too.

Ouch. :-) I kinda share your opinion about the surprising complexity of external snapshots (and I had to throw away tens of VMs while learning the details), but if they are supported by Red Hat, who cares? :-) I think we should go with the flow here. But feel free to try and get RHELs plans changed in this area. :-)

As soon as you attach e.g. an empty CD drive or other devices to the VM, external snapshots don't work any more.

This is not accurate, those were bugs in Cockpit. External snapshots currently don't work with disks that are backed by volumes in file-based pools. I hope that this can be fixed soonish, and then external snapshots should cover all cases that internal ones do. (I think, still no expert.)

As this is about external snapshotting, I guess we should solve all of that in an external snapshotting issue/PR instead of here?

When these two issues are fixed, we don't need the changes in this PR here anymore:

https://gitlab.com/libvirt/libvirt/-/issues/634

https://gitlab.com/libvirt/libvirt/-/issues/631

And then, if someone is using an unofficial snapshot type (internal), then we need to provide a means to convert that to the other type (external) somehow. There needs to be some kind of migration path other than "delete this VM and create a new VM".

I don't think this will happen. It's much easier to keep around the abilities to delete and revert internal snapshots, even if they can't be created anymore.

And to answer the question: "WHY have internal snapshots been deprecated?": My best guess is that people don't want to spend effort on any potential future issues with them, not because they would have known issues. They have missing features comapred to external ones, but they are not broken per se, afaict.

The issue that external snapshots don't work with pools is unfortunate, but hopefully only temporary.

@mvollmer
Copy link
Member Author

Let's block this on https://gitlab.com/libvirt/libvirt/-/issues/631.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants