mattdm suggested the upcoming scheduler change should be discussed here. I might not have enough time to talk details at the moment, but I noticed this is coming up relatively soon. This is what I understand so far about the decision.
Thanks for all the software,
## CFQ is scheduled for removal
Jens Axboe is planning to remove the CFQ I/O scheduler in 4.21. That is, CFQ is part of the "legacy" single-queue block layer, which is going to be removed for ease of maintenance.
Axboes last comment on this timing was made *after* the fix for data corruption on MQ. I.e. the data corruption covered by the recent thread on this list.
*Obligatory disclaimer*. Read the paragraph above, and consider waiting for the next stable kernel update, before you test MQ (including BFQ) on your own machine :-).
 "It's definitely still going" - Jens Abxoe. https://bugzilla.kernel.org/show_bug.cgi?id=201685#c279
## The kernel wants us to choose our new default
For devices which have only one hardware queue, the new upstream default is mq-deadline. Going from CFQ to mq-deadline is a significant change. For example, the deadline scheduler does not support ionice.
The alternative to the MQ deadline scheduler will be BFQ. Upstream discussed this, and the powers that be (mostly Axboe :-) are explicitly expecting us (downstreams) to make this decision.
 BFQ: http://algo.ing.unimo.it/people/paolo/disk_sched/description.php
 "I'd prefer if the distros would lead the way on this, as theyare the ones that will most likely see the most bug reports" - Jens Axboe, https://www.spinics.net/lists/linux-block/msg31062.html
## Arguments for (or against) BFQ?
Paolo Valente kindly wrote an informative response, which I will copy in to a separate message. The following is my very limited first impression.
Personally I lean towards BFQ by default. It appears nominated as the successor to CFQ, I think it's worthy as such, and it makes distinct improvements of it's own. I would recommend it with more confidence if I understood how the improvements work :-).
The deadline scheduler probably isn't a *complete* disaster. Ubuntu ran with the deadline scheduler for a while. I haven't checked whether they changed the tuning knobs though!
RHEL7 defaults to CFQ for SATA drives. This is notable given that it recommends avoiding (or tuning) CFQ on basically any other server hardware (and specifically to avoid it on hardware RAID).
I've tried BFQ on my laptops hard drive (not SSD). It has some associated tests for responsiveness (the "S" tests). I don't have a real-world feel for it, but I agree the test numbers are an impressive improvement over CFQ.
I don't have results for the S tests on the deadline scheduler. I did note the eponymous "deadline" for sync reads has a default of 500 ms.
The other test I have, is that "deadline" doesn't match CFQ's level of fairness for reads v.s. writes, even with the recent addition of WBT. Neither approached what I would actually call fairness. BFQ did. This is due to BFQ's "compensations" for device writeback caching and NCQ. And allegedly for Linux writeback. These extra compensations are the part I don't understand, so far.
 RHEL7 IO schedulers:
 I tried to closely match the test from the cover letters from the WBT patch series. I don't have detailed statistics, but I believe deadline+WBT was less fair than CFQ. It was definitely not more fair.
am i really the only one where 4.19.x up to 4.19.9 randomly crashes?
on my homeserver it takes some hours, a ton of virtual machines on ESXi
6.5 are running stable all the time but on a NAT-Firewall guest it
survives just a few seconds until "kernel panic - Fatal exception in
interrupt" and a production webserver shortly after deploy 4.19.9 hat
the same yet while i though after running for days on 4.19.8/4.19.9 the
problem is now gone
back to 4.18.20-100.fc27.x86_64 which had 3 weeks uptime on the same machine
that's the first time any Fedor akernel is that unstable for years, in
2014 or so there was a series which crashed at raid-check on a RAID10
regulary but since then until a few weeks ago every single build rock stable
god, hopefully 4.20.x becomes stable again and rebased ASAP :-(
On Wed, 12 Dec 2018 16:07:49 -0500
Jeff Moyer <jmoyer(a)redhat.com> wrote:
Thanks for your insight. Doesn't look good for my use of BFQ.
> Note that you can change the current I/O scheduler for any block
> device by echo-ing into /sys/block/<dev>/queue/scheduler. Cat-ing
> that file will give you the list of available schedulers.
That's part of the problem. BFQ doesn't appear in the list of
available schedulers. When I cat that location for my disks, I see
[noop]. Since CFQ does appear there if it is compiled into the kernel,
I'll have to look into what is done for CFQ and see how hard it would
be to patch the kernel to repeat that behavior for BFQ.
My use case in not mq, so after reading one of the links in this
thread about performance, I saw that BFQ gave ~20 to 30 % boost in
disk io performance, and enhanced low latency performance (desktop
responsiveness) for single queue. That's what I want to capture by using
BFQ. I wonder if that is my problem. From what Chris said, an mq
scheduler is required in order to use BFQ, whether it is for mq or
single queue use. I'll try that. I normally use deadline and CFQ for
scheduling. Back to the compiler.
I'm surprised this is so difficult. It's been in the kernel since the
2.x series, and usually the configuration options are excellent for
allowing variation in how the kernel is configured.
On the plus side, I notice only slight degradation in behavior using
noop scheduling. :-) Maybe I should just skip scheduling. :-D
OK that worked for an nvme drive, but not for an internal SATA HDD.
$ sudo lsmod | grep bfq
$ sudo cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
$ sudo insmod /usr/lib/modules/4.19.8-300.fc29.x86_64/kernel/block/bfq.ko.xz
$ sudo cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
$ sudo lsmod | grep bfq
bfq 69632 0
So yeah this seems a lot more difficult than it should be.
On Wed, Dec 12, 2018 at 2:07 PM Jeff Moyer <jmoyer(a)redhat.com> wrote:
> Chris Murphy <lists(a)colorremedies.com> writes:
> > I used two boot params: scsi_mod.use_blk_mq=1 elevator=bfq. I don't
> > think that's a good way for a distribution to set the default though.
> You shouldn't need the "scsi_mod.use_blk_mq=1" option. As of 4.19,
> scsi_mq is the default, and by 4.21 the legacy path will be gone. The
> right way for the distro to set the default I/O scheduler is to use udev
Like I mentioned earlier in the thread, Fedora kernels does not set
scsi_mq as the default. I don't know why.
# CONFIG_SCSI_MQ_DEFAULT is not set
Currently sctp.ko and sctp_probe.ko are present in mod-extra.list.
The problem is that sctp_probe.ko was removed upstream in kernel 4.16
and sctp_diag.ko was added, so sctp.ko is not inside kernel-modules-extra
anymore, but it's inside kernel-modules (due to bz#1656580).
This means that before Fedora "kernel-4.16" sctp.ko was in
"kernel-modules-extra" and after "kernel-4.16" sctp.ko is inside kernel-modules
This commit removes sctp_probe.ko and adds sctp_diag.ko to mod-extra.list.
This means that both sctp.ko and sctp_diag.ko will be moved to
Signed-off-by: Timothy Redaelli <tredaelli(a)redhat.com>
mod-extra.list | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mod-extra.list b/mod-extra.list
index f5841c96..550056ea 100644
@@ -134,7 +134,7 @@ sch_red.ko
how does Fedora think to handle
4.18.20 is the last 4.18.x and happily at least 4.18.20 from F27 runs
fine on F28 systems but given that bug upgrade to 4.19.x seems to be
just a lottery with your data - on some or even many systems all seems
to be fine BUT when you are affected and nobody until now knows the root
honestly the rebase to 4.19 was too fast to start with!
WHY did that rebase happen at all?
kernel-4.19.2-200.fc28 jcline 2018-11-15 01:34:18
bug reported: 2018-11-13 19:42:20 UTC
and yes i had that bugreport in my inbox just because i have subscribed
to filesystem / raid related mailing lists and the word "crazy" appeared
in my mind when i saw the rebase two days later because i expect that
downstream maintainers at least have the same lists subscribed ordinary
Since it was brought up on fedora-devel, I said I would post these monthly:
27 28 29 rawhide
open: 0 204 109 237 (550)
opened: 0 14 78 13 (105)
closed: 49 100 24 7 (180)
If you would like to help improve these stats,
https://fedoraproject.org/wiki/KernelBugTriage is a great place to start!