On Mar 7, 2014, at 6:51 AM, Miloslav Trmač <mitr@volny.cz> wrote:

2014-03-07 14:31 GMT+01:00 Josh Boyer <jwboyer@fedoraproject.org>:

remember. It is only concerned with /usr and to as minimal a degree
as possible /etc. People likely still want snapshot and rollback for
their actual _data_ as well.

(Choosing a random point in the conversation...)

I'm starting to think that snapshots are never the right tool, at best a local optimization:

For the OS and application code and static data: What we really want is the ability to reinstall/redeploy this data if it became lost or corrupted. We don't really want point-in-time snapshots; snapshots are only a local optimization allowing us to "redeploy the version that has been installed yesterday". An ideal technology would allow "instant" deployment of both old and new versions (redeploying and old version and deploying a new version have structurally the same effect on a filesystem), then snapshots wouldn't be needed.

I find for my use case that snapshots are frequently the right tool. But that's because of the workflow/use case I have. Snapshots aren't inherently good.

If Fedora.next is to be more stable/production oriented than previous Fedora's, then the problem Roller Derby is attempting to solve also changes. I think OSTree may eventually address the OS/application coherent updates problem better than the far less granular snapshot strategies to date, but it remains to be seen if we're going to have the same problem or concerns that initiate the desire for rollbacks in the first place.

If all we're looking to do in the near term is make yum/dnf and Gnome offline updates safer, that could happen relatively quickly with existing tools. But it would require a hard dependency on either LVM thinp or Btrfs snapshots, and changes to perform the update in a chroot on the snapshots rather than the active tree. But that's still significantly easier than maintaining dozens or hundreds of snapshots which both yum-plugin-fs-snapshot and snapper do.

Windows and OS X don't do atomic updates either. Windows essentially becomes unusable as updates are applied. OS X application updates require the application to be quit first, which it'll offer to do and then relaunch after the update; while system updates are applied only after user logout, and then the system reboots. But both their "OS trees" (system binaries minus apps) are static. They're essentially identical on every deployment. So they have a known initial quantity and quality, being updated. So they don't have nearly as much failsafe testing of the actual update process because of this. The updates themselves just don't fail. Therefore a rollback is a reinstallation. They don't even keep the old kernel around when it's updated, while our GRUB menu to fallback to the prior kernel is a kind of rollback.

So it sounds to me like OSTree could enable maybe a dozen common trees (rather than almost infinite today). Since they're common, they're also relatively stable, aided by the fact their start and end states during the course of an update are known. But multiple trees are also more flexible than the Windows/OS X paradigm where they have basically one tree, the only variation of which is its version as provided by those companies.

For users' data: What we really want is backups—definitely on a different disk, ideally off-site. An ideal technology would allow continuous replication of the data elsewhere. Snaphots are at best a way to quickly access a backup from the past hour, but are not at all a replacement for a backup.

Right. Whether GF2 or Gluster or other, I'd like local SSD performance for my home, with a (nearly) syncronous local network replicant, and an async offsite. My preference is a time based snapshot, and that is like any other user data, it gets replicated to the network (be it a NAS or an ARM gluster cluster), and that's replicated offsite.

For configuration: What we really want is a VCS, dealing with changesets, documenting who has changed what, when and why. Snapshots are a really poor VCS.
Obviously we don't have all that technology that we "really want", or at least not in a way that is ready to deploy, but we kind of have snapshots. Let's just not think that snapshots are "right".