On Sun, 12 Jul 2020 at 18:09, Chris Murphy <lists@colorremedies.com> wrote:
On Sun, Jul 12, 2020 at 5:39 AM Andy Mender <andymenderunix@gmail.com> wrote:
>
>On updates, a single automatic corrupted snapshot can
> potentially hose the entire snapshotted volume.

How do you mean? If this is a sort of superficial corruption like a
bad/failed/partial update, inconsistency between package manager and
what's installed - this can be self-contained to a specific snapshot.
One possible idea for updates is snapshot and do the update out of
band (not the current running sysroot) on a snapshot. If the update
fails for whatever reason, destroy the snapshot. Corruption that
affects multiple subvolumes wouldn't be related to snapshotting, but
the shared trees: extent, chunk, csum, uuid, etc. trees.

I'm sorry, I should've been a little more specific. What I meant was that a corrupted snapshot can potentially impact the subvolume and put it in a state in which simply deleting the latest snapshot is not going to help or can't easily be done.

>Also, if your system is almost broken after the change,
> no snapshot will help.

I'm not sure about the nature of the brokenness in your example. Btrfs
does have a concept of a volume wide snapshot, which is the seed
device. The file system is merely marked read-only, but can have a
second device added that accepts all writes. If this two device volume
were to become irreversibly confused, it'd still be possible to revert
to the read-only device - even temporarily - as a kind of "recovery"
boot. With extreme prejudice, a true factory reset is possible by
wiping the read-write 2nd device and starting over. It's also possible
to use it for replication - by adding a 2nd device and removing the
1st, an exact copy is made. This is a whole separate ball of wax, and
while there are ideas how it might be leveraged, there's no plan to do
so yet.

I agree, but it requires adding a second device and sometimes that's not possible or tricky. I extrapolated a lot, but sometimes btrfs tools are marketed as a "catch all" which can save the user from accidental installations or updates and that's not always true.