On Wed, Aug 28, 2019 at 2:40 PM Josef Bacik
<josef(a)toxicpanda.com> wrote:
>
> On Wed, Aug 28, 2019 at 02:35:39PM -0400, Laura Abbott wrote:
> > On 8/28/19 1:58 PM, Josef Bacik wrote:
> > > On Tue, Aug 27, 2019 at 07:53:20AM -0400, Laura Abbott wrote:
> > > > On 8/26/19 11:39 PM, Neal Gompa wrote:
> > > > > On Mon, Aug 26, 2019 at 11:16 AM Laura Abbott
<labbott(a)redhat.com> wrote:
> > > > > >
> > > > > > On 8/23/19 9:00 PM, Chris Murphy wrote:
> > > > > > > On Fri, Aug 23, 2019 at 1:17 PM Adam Williamson
> > > > > > > <adamwill(a)fedoraproject.org> wrote:
> > > > > > >
> > > > > > > > So, there was recently a Thing where btrfs
installs were broken, and
> > > > > > > > this got accepted as a release blocker:
> > > > > > > >
> > > > > > > >
https://bugzilla.redhat.com/show_bug.cgi?id=1733388
> > > > > > >
> > > > > > > Summary: This bug was introduced and discovered in
linux-next, it
> > > > > > > started to affect Fedora 5.3.0-rc0 kernels in openqa
tests, patch
> > > > > > > appeared during rc1, and the patch was merged into
5.3.0-rc2. The bug
> > > > > > > resulted in a somewhat transient deadlock which caused
installs to
> > > > > > > hang, but no corruption. The fix, 2 files changed, 12
insertions, 8
> > > > > > > deletions (1/2 the insertions are comments).
> > > > > > >
> > > > > > > How remarkable or interesting is this bug? And in
particular, exactly
> > > > > > > how much faster should it have been fixed in order to
avoid worrying
> > > > > > > about it being a blocker bug?
> > > > > > >
> > > > > > > 7/25 14:27 utc bug patch was submitted to
linux-btrfs@
> > > > > > > 7/25 22:33 utc bug was first reported in Fedora
bugzilla
> > > > > > > 7/26 19:20 utc I confirmed upstream's patch
related to this bug with
> > > > > > > upstream and updated the Fedora bug
> > > > > > > 7/26 22:50 utc I confirmed it was merged into rc2, and
updated the Fedora bug
> > > > > > >
> > > > > > > So in the context of status quo, where Btrfs is
presented as an option
> > > > > > > in the installer and if there are bugs they Beta
blocking, how could
> > > > > > > or should this have been fixed sooner? What about the
handling should
> > > > > > > have been different?
> > > > > > >
> > > > > >
> > > > > > That's a fair question. This bug actually represents
how this _should_
> > > > > > work. The concern is that in the past we haven't seen a
lot engagement
> > > > > > in the past. Maybe today that has changed as demonstrated
by this thread.
> > > > > > I'm still concerned about having this be a blocker vs.
just keeping it
> > > > > > as an option, simply because a blocker stops the entire
release and it
> > > > > > can be a last minute scramble to get things fixed. This was
the ideal
> > > > > > case for a blocker bugs and I'm skeptical about all
bugs going this well.
> > > > > > If we had a few more people who were willing to be on the
btrfs alias and
> > > > > > do the work for blocker bugs it would be a much stronger
case.
> > > > > >
> > > > >
> > > > > Out of curiosity, how many such issues have we had in the past
2
> > > > > years? I personally can't recall any monumental occasions
where people
> > > > > were scrambling over *Btrfs* in Fedora. If anything, we continue
to
> > > > > inherit the work that SUSE and Facebook are doing upstream as
part of
> > > > > us continually updating our kernels, which I'm grateful
for.
> > > > >
> > > > > And in the instances where we've had such issues, has anyone
reached
> > > > > out to btrfs folks in Fedora? Chris and myself are the current
ones,
> > > > > but there have been others in the past. Both of us are
subscribed to
> > > > > the linux-btrfs mailing list, and Chris has a decent rapport
with most
> > > > > of the btrfs developers.
> > > > >
> > > > > What more do you want? Actual btrfs developers in Fedora? We
don't
> > > > > have any for the majority of filesystems Fedora supports, only
XFS. Is
> > > > > there some kind of problem with communicating with the upstream
kernel
> > > > > developers about Fedora bugs that I'm not aware of?
> > > > >
> > > >
> > > > Again, it's about length of overall development. ext and XFS
have
> > > > a much longer history in general which is something that's
important
> > > > for file system stability in general. It's also a bit of a
catch-22
> > > > where the rate of btrfs use in Fedora is so low we don't
actually
> > > > see issues.
> > > >
> > > > > > > I note here that ext2 and ext3 are offered as file
systems in
> > > > > > > Custom/Advanced partitioning and in this sense have
parity with Btrfs.
> > > > > > > If this same bug occurred in ext2 or ext3 would or
should that cause
> > > > > > > discussion to drop them from the installer, even if
the bug were fixed
> > > > > > > within 24 hours of discovery and patch? What about
vfat? That's
> > > > > > > literally the only truly required filesystem that must
work, for the
> > > > > > > most commonly supported hardware so it can't be
dropped, we'd just be
> > > > > > > stuck until it got fixed. That work would have to be
done upstream,
> > > > > > > yes?
> > > > > > >
> > > > > >
> > > > > > I don't think that's really a fair comparison. Just
because options
> > > > > > are presented doesn't mean all of them are equal.
ext2/ext3 and vfat
> > > > > > have been in development for much longer than btrfs and
length of development
> > > > > > is something that's particularly important for file
system stability
> > > > > > from talking with file system developers. It's not
impossible for there
> > > > > > to be bugs in ext4 for example (we've certainly seen
them before) but
> > > > > > btrfs is only now gaining overall stability and we're
still more likely to see
> > > > > > bugs, especially with custom setups where people are likely
to find
> > > > > > edge cases.
> > > > > >
> > > > >
> > > > > Nope. We can totally use this because LVM has not existed as
long (we
> > > > > use LVM + filesystem by default, not plain partitions), and we
still
> > > > > encounter quirks with things like thinp LVM combined with these
> > > > > filesystems. OverlayFS is mostly hot garbage (kernel people know
it,
> > > > > container people know it, filesystem people know it, etc.), and
yet we
> > > > > continue to try to use it in more places. Stratis is in an odd
state
> > > > > of limbo now, since its main developer and advocate left Red
Hat.
> > > > > > There are plenty of examples of Red Hat doing
crazy/experimental
> > > > > things... I'd like to think Red Hat isn't supposed to be
special here,
> > > > > but in this realm, it seems like it is...
> > > > >
> > > > >
> > > >
> > > > btrfs still doesn't give me the warm fuzzies and I also think
this
> > > > is a bigger issue than other features simply because user data is at
> > > > stake. We do need to consider that the failure case is not "I
can't do X"
> > > > but "my precious data which I have been trying to snapshot is
now
> > > > inaccessible" in a way that's even worse than say rpm
database
> > > > corruption. Even if it is in the advanced partitioning or not the
> > > > default, we can still end up with people clicking in because they
> > > > read an article about how btrfs was the hot new thing.
> > > >
> > > > There are two parts to this here: killing off btrfs entirely and
> > > > btrfs as release criteria. I think you are correct that there's
> > > > enough community support to justify keeping btrfs around at least
> > > > in the kernel (I can't speak for anaconda here)
> > > >
> > > > As for btrfs as release criteria, I'd feel much more confident
> > > > about that if we could have a file system developer on the btrfs
> > > > alias. I'm glad to hear the btrfs upstream community has been
> > > > receptive to bugs but it's still much easier to make things
> > > > happen if we have contributors who are active in the Fedora
> > > > community, especially if we want the advanced features that
> > > > btrfs has (which is why people want it anyway). So, who would
> > > > you suggest to work with us in Fedora?
> > >
> > > You can always CC me, if I get an email from you or anybody else I
recognize
> > > from the fedora kernel team I'm going to pay attention to it.
> > >
> > > Facebook runs more btrfs file systems than Fedora has installs, so
we're pretty
> > > happy with how it works stability wise. That being said we're
slightly more
> > > fault tolerant than most users. If you guys are hitting problems chances
are
> > > we'll hit them eventually as well, so it makes sense for us to be on
top of
> > > them.
> > >
> > > I agree it would be better if somebody inside Fedora was able to help out,
but
> > > again I'm only an email away. Thanks,
> > >
> >
> > So it appears you are on the btrfs alias already:
> >
> > fedora-kernel-btrfs:
fs-maint@redhat.com,josef@toxicpanda.com,bugzilla(a)colorremedies.com
> >
> > This technically meets the requirements if you are willing to stay on this
> > alias and (continue) to help with requests as needed. I would feel more
> > confident if we had a few more people involved as well. Even better
> > would be proactively going through the bugzillas to help find the
> > btrfs ones.
>
> Yeah that goes into a bucket that basically is ignored. The only time I'll
peek
> in there is if somebody specifically pokes me, because generally speaking we hit
> the problems and fix them welllllll before Fedora users start to notice them.
Fedora chugs along at the rate of daily upstream Linus snapshots. If
you're hitting and fixing issues before Fedora users see them, I'm
curious why Fedora users would ever see them.
Where does the lag come from? Are the fixes queued internally?
Staged in an upstream subsystem tree? Is there a way for interested
btrfs people to proactively just get those fixed in Fedora before
users hit them?
For this particular example we saw the problem in testing and had a patch on the
mailinglist before you hit the problem. It was in a tree and sent to Linus, and
was merged the day after the bugzilla was reported. So yes before users see
them, unless they are subscribed to the daily snapshots, which I assume is just
for testing right? Or were you guys going to ship 5.3-rc0?
On one hand I understand all of the consternation around making btrfs bugs
blockers for Fedora, but on the other hand it seems a bit silly to be having
this conversation at all based on hitting a bug that went into the merge window
and then was fixed before rc1 was even cut. Thanks,
Josef