I updated my system this morning. Updated packages included a new kernel and some SElinux stuff among other things (the complete list is attached). I now find that neither of my QEMU/KVM guests (one Fedora, one Windows 10) have Internet access, though they do have access to my host. They were both working perfectly before the update. Nothing else in my system has changed (in particular, I haven't touched the Firewall rules and the last updates to NetworkManager or Qemu were several days ago).
I rebooted to the previous kernel - no difference.
I set SElinux to permissive and rebooted the Fedora guest - no difference.
Before trying to downgrade the entire update, is there anything else I can do?
poc
On 24/01/2019 –– 12:08:03PM +0000, Patrick O'Callaghan wrote:
Before trying to downgrade the entire update, is there anything else I can do?
Can you boot from another kernel? I have had problems with network and GUI with kernel-4.20.3-200.fc29.x86_64 on a xen Dom0, so it might be worth trying a different kernel before a downgrade. 4.19.15 works on my machine.
Michael Young
On Thu, 2019-01-24 at 21:41 +0000, YOUNG, MICHAEL A. wrote:
On 24/01/2019 –– 12:08:03PM +0000, Patrick O'Callaghan wrote:
Before trying to downgrade the entire update, is there anything else I can do?
Can you boot from another kernel? I have had problems with network and GUI with kernel-4.20.3-200.fc29.x86_64 on a xen Dom0, so it might be worth trying a different kernel before a downgrade. 4.19.15 works on my machine.
As I said, I tried a different kernel with no effect.
poc
On Thu, 2019-01-24 at 14:01 +0100, Kai Bojens wrote:
On 24/01/2019 –– 12:08:03PM +0000, Patrick O'Callaghan wrote:
Before trying to downgrade the entire update, is there anything else I can do?
Well, what do the logfiles tell you? Does journalctl have any information about this? Are there any error messages?
The logfiles show nothing. There are no error messages. The network simply says "no route to host".
Yesterday I noted that the contents of /etc/resolv.conf seemed wrong. When I fixed this manually, the network came back up.
When I rebooted the host this morning (to the latest kernel), the guest network was down again. This time I can't even ping the host's IP address so editing /etc/resolv.conf solves nothing.
poc
On 1/24/19 8:08 PM, Patrick O'Callaghan wrote:
I updated my system this morning. Updated packages included a new kernel and some SElinux stuff among other things (the complete list is attached). I now find that neither of my QEMU/KVM guests (one Fedora, one Windows 10) have Internet access, though they do have access to my host. They were both working perfectly before the update. Nothing else in my system has changed (in particular, I haven't touched the Firewall rules and the last updates to NetworkManager or Qemu were several days ago).
I rebooted to the previous kernel - no difference.
I set SElinux to permissive and rebooted the Fedora guest - no difference.
Before trying to downgrade the entire update, is there anything else I can do?
What type of network is defined for your guests? I'm using macvtap instead of NAT and all is working fine. My host is a fully updated F29/KDE and the guest is fully updated F28/KDE.
On Thu, 2019-01-24 at 22:40 +0800, Ed Greshko wrote:
On 1/24/19 8:08 PM, Patrick O'Callaghan wrote:
I updated my system this morning. Updated packages included a new kernel and some SElinux stuff among other things (the complete list is attached). I now find that neither of my QEMU/KVM guests (one Fedora, one Windows 10) have Internet access, though they do have access to my host. They were both working perfectly before the update. Nothing else in my system has changed (in particular, I haven't touched the Firewall rules and the last updates to NetworkManager or Qemu were several days ago).
I rebooted to the previous kernel - no difference.
I set SElinux to permissive and rebooted the Fedora guest - no difference.
Before trying to downgrade the entire update, is there anything else I can do?
What type of network is defined for your guests? I'm using macvtap instead of NAT and all is working fine. My host is a fully updated F29/KDE and the guest is fully updated F28/KDE.
I use NAT (with virtio), as I have always done. According to the virt- manager config widget, "macvtap does not work for host->guest communication".
The Fedora guest is F28 Server. When I reported the problem yesterday it hadn't been updated in months. I updated it last night but it still isn't working (see my reply to Kai Bojens kb@kbojens.de in this thread).
poc
poc
On 1/25/19 8:06 PM, Patrick O'Callaghan wrote:
On Thu, 2019-01-24 at 22:40 +0800, Ed Greshko wrote:
On 1/24/19 8:08 PM, Patrick O'Callaghan wrote:
I updated my system this morning. Updated packages included a new kernel and some SElinux stuff among other things (the complete list is attached). I now find that neither of my QEMU/KVM guests (one Fedora, one Windows 10) have Internet access, though they do have access to my host. They were both working perfectly before the update. Nothing else in my system has changed (in particular, I haven't touched the Firewall rules and the last updates to NetworkManager or Qemu were several days ago).
I rebooted to the previous kernel - no difference.
I set SElinux to permissive and rebooted the Fedora guest - no difference.
Before trying to downgrade the entire update, is there anything else I can do?
What type of network is defined for your guests? I'm using macvtap instead of NAT and all is working fine. My host is a fully updated F29/KDE and the guest is fully updated F28/KDE.
I use NAT (with virtio), as I have always done. According to the virt- manager config widget, "macvtap does not work for host->guest communication".
Yes, it doesn't. But I don't wish to use NAT since I use IPv6 stateless mode and that doesn't work with NAT
The Fedora guest is F28 Server. When I reported the problem yesterday it hadn't been updated in months. I updated it last night but it still isn't working (see my reply to Kai Bojens kb@kbojens.de in this thread).
I installed a F29 guest today and it works just fine for me.
Time to breakout wireshark to see if anything is actually being sent/received?
On Fri, 2019-01-25 at 20:13 +0800, Ed Greshko wrote:
On 1/25/19 8:06 PM, Patrick O'Callaghan wrote:
On Thu, 2019-01-24 at 22:40 +0800, Ed Greshko wrote:
On 1/24/19 8:08 PM, Patrick O'Callaghan wrote:
I updated my system this morning. Updated packages included a new kernel and some SElinux stuff among other things (the complete list is attached). I now find that neither of my QEMU/KVM guests (one Fedora, one Windows 10) have Internet access, though they do have access to my host. They were both working perfectly before the update. Nothing else in my system has changed (in particular, I haven't touched the Firewall rules and the last updates to NetworkManager or Qemu were several days ago).
I rebooted to the previous kernel - no difference.
I set SElinux to permissive and rebooted the Fedora guest - no difference.
Before trying to downgrade the entire update, is there anything else I can do?
What type of network is defined for your guests? I'm using macvtap instead of NAT and all is working fine. My host is a fully updated F29/KDE and the guest is fully updated F28/KDE.
I use NAT (with virtio), as I have always done. According to the virt- manager config widget, "macvtap does not work for host->guest communication".
Yes, it doesn't. But I don't wish to use NAT since I use IPv6 stateless mode and that doesn't work with NAT
The Fedora guest is F28 Server. When I reported the problem yesterday it hadn't been updated in months. I updated it last night but it still isn't working (see my reply to Kai Bojens kb@kbojens.de in this thread).
I installed a F29 guest today and it works just fine for me.
Time to breakout wireshark to see if anything is actually being sent/received?
I was afraid of that. Sigh.
poc
On Fri, 2019-01-25 at 12:40 +0000, Patrick O'Callaghan wrote:
On Fri, 2019-01-25 at 20:13 +0800, Ed Greshko wrote:
On 1/25/19 8:06 PM, Patrick O'Callaghan wrote:
On Thu, 2019-01-24 at 22:40 +0800, Ed Greshko wrote:
On 1/24/19 8:08 PM, Patrick O'Callaghan wrote:
I updated my system this morning. Updated packages included a new kernel and some SElinux stuff among other things (the complete list is attached). I now find that neither of my QEMU/KVM guests (one Fedora, one Windows 10) have Internet access, though they do have access to my host. They were both working perfectly before the update. Nothing else in my system has changed (in particular, I haven't touched the Firewall rules and the last updates to NetworkManager or Qemu were several days ago).
I rebooted to the previous kernel - no difference.
I set SElinux to permissive and rebooted the Fedora guest - no difference.
Before trying to downgrade the entire update, is there anything else I can do?
What type of network is defined for your guests? I'm using macvtap instead of NAT and all is working fine. My host is a fully updated F29/KDE and the guest is fully updated F28/KDE.
I use NAT (with virtio), as I have always done. According to the virt- manager config widget, "macvtap does not work for host->guest communication".
Yes, it doesn't. But I don't wish to use NAT since I use IPv6 stateless mode and that doesn't work with NAT
The Fedora guest is F28 Server. When I reported the problem yesterday it hadn't been updated in months. I updated it last night but it still isn't working (see my reply to Kai Bojens kb@kbojens.de in this thread).
I installed a F29 guest today and it works just fine for me.
Time to breakout wireshark to see if anything is actually being sent/received?
I was afraid of that. Sigh.
Nothing illuminating I'm afraid. 41 20.186144769 192.168.122.1 192.168.122.167 ICMP 98 Echo (ping) request id=0x1519, seq=1/256, ttl=64 (no response found!) (repeated)
This is the host pinging the guest. For completeness, some of the relevant data:
On the host: $ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP group default qlen 1000 link/ether d4:3d:7e:f4:1b:08 brd ff:ff:ff:ff:ff:ff inet 192.168.1.73/24 brd 192.168.1.255 scope global noprefixroute enp3s0 valid_lft forever preferred_lft forever inet6 fe80::d63d:7eff:fef4:1b08/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:8b:88:60 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:8b:88:60 brd ff:ff:ff:ff:ff:ff 5: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:54:00:1d:55:89 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe1d:5589/64 scope link valid_lft forever preferred_lft forever 6: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:54:00:b0:20:88 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:feb0:2088/64 scope link valid_lft forever preferred_lft forever
$ ip route default via 192.168.1.1 dev enp3s0 proto static metric 100 192.168.1.0/24 dev enp3s0 proto kernel scope link src 192.168.1.73 metric 100 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
(The guest is on interface virbr0). For the equivalent settings on the guest, see the attached screenshot.
Pings in both directions fail, in case that wasn't clear. BTW the Windows guest also fails in the same way.
I'm at a loss.
poc
On Fri, 2019-01-25 at 15:51 +0000, Patrick O'Callaghan wrote:
Pings in both directions fail, in case that wasn't clear. BTW the Windows guest also fails in the same way.
I'm at a loss.
Just to add that I attempted to set up a fresh Fedora server guest (from a netinst.iso), using the default virt-manager settings. Anaconda couldn't find the network.
poc
On Fri, 2019-01-25 at 17:07 +0000, Patrick O'Callaghan wrote:
On Fri, 2019-01-25 at 15:51 +0000, Patrick O'Callaghan wrote:
Pings in both directions fail, in case that wasn't clear. BTW the Windows guest also fails in the same way.
I'm at a loss.
Just to add that I attempted to set up a fresh Fedora server guest (from a netinst.iso), using the default virt-manager settings. Anaconda couldn't find the network.
I'm 99% sure it has something to do with the firewall. Thing is, I haven't touched the firewall rules. Nevertheless I see this:
$ systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-01-25 21:37:32 GMT; 42min ago Docs: man:firewalld(1) Main PID: 2421 (firewalld) Tasks: 3 (limit: 4915) Memory: 28.2M CGroup: /system.slice/firewalld.service └─2421 /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid
Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --destination 192.168.122.0/24 --out-in> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --source 192.168.122.0/24 --in-interfac> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --out-interface v> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --out-interface virbr0 --jump REJECT' f> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --jump REJECT' fa> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete OUTPUT --out-interface virbr0 --protocol udp --> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de>
I tried reloading firewalld and got the same result. I fired up the firewall applet and suddenly the guests had network access, even though I didn't change anything. I quit the applet and boom, the guests lost network access again. Fired it up once more, but this time the guest access didn't come back.
I don't know if any of this is repeatable. Time to slaughter a chicken by the light of the moon?
poc
On 1/26/19 6:24 AM, Patrick O'Callaghan wrote:
I'm 99% sure it has something to do with the firewall. Thing is, I haven't touched the firewall rules. Nevertheless I see this:
$ systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-01-25 21:37:32 GMT; 42min ago Docs: man:firewalld(1) Main PID: 2421 (firewalld) Tasks: 3 (limit: 4915) Memory: 28.2M CGroup: /system.slice/firewalld.service └─2421 /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid
Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --destination 192.168.122.0/24 --out-in> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --source 192.168.122.0/24 --in-interfac> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --out-interface v> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --out-interface virbr0 --jump REJECT' f> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --jump REJECT' fa> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete OUTPUT --out-interface virbr0 --protocol udp --> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de>
I tried reloading firewalld and got the same result. I fired up the firewall applet and suddenly the guests had network access, even though I didn't change anything. I quit the applet and boom, the guests lost network access again. Fired it up once more, but this time the guest access didn't come back.
I don't know if any of this is repeatable. Time to slaughter a chicken by the light of the moon?
Well, my next suggestion was going to be to disable the firewall as a test. But, I must admit that I was hesitant to suggest that since DHCP seems to have worked.
I just now installed F29 Workstation in a guest using virtio and it is working just fine. FWIW, I only have vnet0 and no vnet1.
No errors show in firewalld status. virbr0 and vnet0 both show as being in the Default Zone: public in the firewall configuration under "Connections". I have no Rich Rules defined.
[egreshko@meimei ~]$ ping 192.168.122.86 PING 192.168.122.86 (192.168.122.86) 56(84) bytes of data. 64 bytes from 192.168.122.86: icmp_seq=1 ttl=64 time=0.238 ms 64 bytes from 192.168.122.86: icmp_seq=2 ttl=64 time=0.251 ms 64 bytes from 192.168.122.86: icmp_seq=3 ttl=64 time=0.377 ms
On Sat, 2019-01-26 at 07:18 +0800, Ed Greshko wrote:
I tried reloading firewalld and got the same result. I fired up the firewall applet and suddenly the guests had network access, even though I didn't change anything. I quit the applet and boom, the guests lost network access again. Fired it up once more, but this time the guest access didn't come back.
I don't know if any of this is repeatable. Time to slaughter a chicken by the light of the moon?
Well, my next suggestion was going to be to disable the firewall as a test. But, I must admit that I was hesitant to suggest that since DHCP seems to have worked.
I just now installed F29 Workstation in a guest using virtio and it is working just fine. FWIW, I only have vnet0 and no vnet1.
No errors show in firewalld status. virbr0 and vnet0 both show as being in the Default Zone: public in the firewall configuration under "Connections". I have no Rich Rules defined.
The plot thickens. First of all, my snippet from wireshark was of course wrong as I was monitoring virbr0 instead of vnet0. Silly me.
Secondly, after a reboot to make sure everything was in default state, I fired up the Fedora guest alone, and lo and behold it worked. Then I fired up the Windows guest. It didn't work. Took it down and now the Fedora guest stopped working. Stopped and restarted both of them and they both work. Then suddenly they don't. Then they do again, or one does and the other doesn't.
While all this is going on, I try ping6 to both of them. It always works, even when ping doesn't.
My theory is that something is messing with DHCP. I'm running dnsmasq but I've been doing that for months. However avahi is also running, so perhaps there's some kind of conflict. And libvirtd apparently also runs its own dnsmasq internally, according to https://wiki.libvirt.org/page/Libvirtd_and_dnsmasq
poc
On 1/26/19 7:55 PM, Patrick O'Callaghan wrote:
The plot thickens. First of all, my snippet from wireshark was of course wrong as I was monitoring virbr0 instead of vnet0. Silly me.
Secondly, after a reboot to make sure everything was in default state, I fired up the Fedora guest alone, and lo and behold it worked. Then I fired up the Windows guest. It didn't work. Took it down and now the Fedora guest stopped working. Stopped and restarted both of them and they both work. Then suddenly they don't. Then they do again, or one does and the other doesn't.
While all this is going on, I try ping6 to both of them. It always works, even when ping doesn't.
My theory is that something is messing with DHCP. I'm running dnsmasq but I've been doing that for months. However avahi is also running, so perhaps there's some kind of conflict. And libvirtd apparently also runs its own dnsmasq internally, according to https://wiki.libvirt.org/page/Libvirtd_and_dnsmasq
Well, I only have Fedora guests and I just switched an existing one to use the NAT and running multiple guests works OK.
I don't run my own instance of dnsmasq. Just these of libvirt
[egreshko@meimei ~]$ ps -eaf | grep dnsmasq dnsmasq 1357 1 0 20:09 ? 00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper root 1358 1357 0 20:09 ? 00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
[egreshko@meimei ~]$ ps -eaf | grep avahi avahi 760 1 0 20:08 ? 00:00:00 avahi-daemon: running [meimei.local] avahi 918 760 0 20:08 ? 00:00:00 avahi-daemon: chroot helper
I currently don't have a Windows guest.
On Sat, 2019-01-26 at 20:26 +0800, Ed Greshko wrote:
On 1/26/19 7:55 PM, Patrick O'Callaghan wrote:
The plot thickens. First of all, my snippet from wireshark was of course wrong as I was monitoring virbr0 instead of vnet0. Silly me.
Secondly, after a reboot to make sure everything was in default state, I fired up the Fedora guest alone, and lo and behold it worked. Then I fired up the Windows guest. It didn't work. Took it down and now the Fedora guest stopped working. Stopped and restarted both of them and they both work. Then suddenly they don't. Then they do again, or one does and the other doesn't.
While all this is going on, I try ping6 to both of them. It always works, even when ping doesn't.
My theory is that something is messing with DHCP. I'm running dnsmasq but I've been doing that for months. However avahi is also running, so perhaps there's some kind of conflict. And libvirtd apparently also runs its own dnsmasq internally, according to https://wiki.libvirt.org/page/Libvirtd_and_dnsmasq
Well, I only have Fedora guests and I just switched an existing one to use the NAT and running multiple guests works OK.
I don't run my own instance of dnsmasq. Just these of libvirt
[egreshko@meimei ~]$ ps -eaf | grep dnsmasq dnsmasq 1357 1 0 20:09 ? 00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper root 1358 1357 0 20:09 ? 00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
[egreshko@meimei ~]$ ps -eaf | grep avahi avahi 760 1 0 20:08 ? 00:00:00 avahi-daemon: running [meimei.local] avahi 918 760 0 20:08 ? 00:00:00 avahi-daemon: chroot helper
Same here. To eliminate some variables, I turned off my dnsmasq service, disabled it and rebooted. The problem is still there: for a few moments the guests are network-reachable, then they aren't. They may come back, they may not. Or one does and the other doesn't. It's completely unpredictable. If I could even figure out which component is causing the problem I could BZ it, but nothing stands out.
I'll keep looking but I'm seriously considering a complete system reinstall, something I haven't done in about 5 years, in case some cruft from earlier iterations of Fedora is somehow lurking in the shadows.
poc
On 1/27/19 7:48 AM, Patrick O'Callaghan wrote:
Same here. To eliminate some variables, I turned off my dnsmasq service, disabled it and rebooted. The problem is still there: for a few moments the guests are network-reachable, then they aren't. They may come back, they may not. Or one does and the other doesn't. It's completely unpredictable. If I could even figure out which component is causing the problem I could BZ it, but nothing stands out.
I'll keep looking but I'm seriously considering a complete system reinstall, something I haven't done in about 5 years, in case some cruft from earlier iterations of Fedora is somehow lurking in the shadows.
Well, I can't say that I've ever seen "intermittent" problems like that caused by SW. But since the host and guest are on the same HW it seems to be the only thing that makes sense.
The only thing that comes to mind is that communication on a LAN with IPv4 takes place based on the MAC address and ARP request/response. If somehow guest obtained the same MAC address for their interfaces one may see odd behavior.
On Sun, 2019-01-27 at 21:50 +0800, Ed Greshko wrote:
On 1/27/19 7:48 AM, Patrick O'Callaghan wrote:
Same here. To eliminate some variables, I turned off my dnsmasq service, disabled it and rebooted. The problem is still there: for a few moments the guests are network-reachable, then they aren't. They may come back, they may not. Or one does and the other doesn't. It's completely unpredictable. If I could even figure out which component is causing the problem I could BZ it, but nothing stands out.
I'll keep looking but I'm seriously considering a complete system reinstall, something I haven't done in about 5 years, in case some cruft from earlier iterations of Fedora is somehow lurking in the shadows.
Well, I can't say that I've ever seen "intermittent" problems like that caused by SW. But since the host and guest are on the same HW it seems to be the only thing that makes sense.
The only thing that comes to mind is that communication on a LAN with IPv4 takes place based on the MAC address and ARP request/response. If somehow guest obtained the same MAC address for their interfaces one may see odd behavior.
Apparently not. The link/ether field is different for all of them:
$ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP group default qlen 1000 link/ether d4:3d:7e:f4:1b:08 brd ff:ff:ff:ff:ff:ff inet 192.168.1.73/24 brd 192.168.1.255 scope global noprefixroute enp3s0 valid_lft forever preferred_lft forever inet6 fe80::d63d:7eff:fef4:1b08/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:8b:88:60 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:8b:88:60 brd ff:ff:ff:ff:ff:ff 5: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:54:00:b0:20:88 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:feb0:2088/64 scope link valid_lft forever preferred_lft forever 6: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:54:00:1d:55:89 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe1d:5589/64 scope link valid_lft forever preferred_lft forever
Another thing: the gateway address (192.168.122.1) is pingable from both sides, i.e. from the guest and the host, but packets are not being forwarded. net.ipv4.ip_forward is 1 (on), so possibly the problem is at a lower level with the actual bridge. Not sure how I can check that, but note that ip6 packets do go back and forth so it seems unlikely.
The firewall rules are:
$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp3s0 p3p1 virbr0 virbr0-nic sources: services: dhcp dhcpv6-client dns mdns mountd nfs rpc-bind rsyncd samba ssh ports: 32410/udp 32413/tcp 32412/tcp 8200/tcp 1900/udp 32400/tcp 32469/tcp 32414/tcp 24800/tcp protocols: masquerade: no forward-ports: source-ports: 24800/tcp icmp-blocks: rich rules:
Thanks for your patience in looking at this Ed. Don't feel pressured to keep responding :-)
poc
On 1/27/19 10:45 PM, Patrick O'Callaghan wrote:
Thanks for your patience in looking at this Ed. Don't feel pressured to keep responding
I only feel pressure to respond to our cats. Oh, and my wife. Besides, who doesn't love a mystery?
When I was talking about the MAC addresses I was speculating that the guests may have the conflict. Not that an interface on the host may.
If you use wireshark to monitor just vnet0 and do an ssh to the guest do you see an ARP request/response happen first? Is it correct?
I'm being distracted by other things at the moment, but just noticed that my FW shows a difference.
[egreshko@meimei .ssh]$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp2s0 vnet0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
On Mon, 2019-01-28 at 06:18 +0800, Ed Greshko wrote:
If you use wireshark to monitor just vnet0 and do an ssh to the guest do you see an ARP request/response happen first? Is it correct?
[...]
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
52:54:00:b0:20:88 ff:ff:ff:ff:ff:ff ARP 42 Who has 192.168.122.1? Tell 192.168.122.167
52:54:00:8b:88:60 is the vnet0 interface. 192.168.122.1 is the gateway, 192.168.122.167 is the guest.
Nothing ever comes back. IOW the guest is trying to do ARP resolution but nothing is answering it (avahi-daemon is running, as is the libvirt copy of dnsmasq). Also:
$ ip neigh|grep 122 192.168.122.167 dev virbr0 lladdr 52:54:00:b0:20:88 STALE 192.168.122.193 dev virbr0 lladdr 52:54:00:1d:55:89 STALE
Those are the two guest addresses.
[egreshko@meimei .ssh]$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp2s0 vnet0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
Nothing to remark on there I think. I have some extra ports and services enabled but that's to be expected.
poc
On 1/28/19 7:12 AM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 06:18 +0800, Ed Greshko wrote:
If you use wireshark to monitor just vnet0 and do an ssh to the guest do you see an ARP request/response happen first? Is it correct?
[...]
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
52:54:00:b0:20:88 ff:ff:ff:ff:ff:ff ARP 42 Who has 192.168.122.1? Tell 192.168.122.167
52:54:00:8b:88:60 is the vnet0 interface. 192.168.122.1 is the gateway, 192.168.122.167 is the guest.
Nothing ever comes back. IOW the guest is trying to do ARP resolution but nothing is answering it (avahi-daemon is running, as is the libvirt copy of dnsmasq). Also:
$ ip neigh|grep 122 192.168.122.167 dev virbr0 lladdr 52:54:00:b0:20:88 STALE 192.168.122.193 dev virbr0 lladdr 52:54:00:1d:55:89 STALE
Those are the two guest addresses.
Humm.... I see
37 67.694929326 RealtekU_f3:3f:02 RealtekU_9a:e8:49 ARP 42 Who has 192.168.122.1? Tell 192.168.122.86 38 67.694969398 RealtekU_9a:e8:49 RealtekU_f3:3f:02 ARP 42 192.168.122.1 is at 52:54:00:9a:e8:49
[egreshko@meimei ~]$ ip neigh|grep 122 192.168.122.86 dev virbr0 lladdr 52:54:00:f3:3f:02 REACHABLE
(Prior to an ssh it was STALE even with ARP traffic)
[egreshko@meimei .ssh]$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp2s0 vnet0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
Nothing to remark on there I think. I have some extra ports and services enabled but that's to be expected.
I was noting the difference between yours...
interfaces: enp3s0 p3p1 virbr0 virbr0-nic
and mine
interfaces: enp2s0 vnet0 wlp4s0
On Mon, 2019-01-28 at 08:20 +0800, Ed Greshko wrote:
[egreshko@meimei .ssh]$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp2s0 vnet0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
Nothing to remark on there I think. I have some extra ports and services enabled but that's to be expected.
I was noting the difference between yours...
interfaces: enp3s0 p3p1 virbr0 virbr0-nic
and mine
interfaces: enp2s0 vnet0 wlp4s0
Surely you must have virbr0? Not sure where virbr0-nic comes from but I assume it's created by libvirt.
poc
On Mon, Jan 28, 2019 at 11:07 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Mon, 2019-01-28 at 08:20 +0800, Ed Greshko wrote:
[egreshko@meimei .ssh]$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp2s0 vnet0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
Nothing to remark on there I think. I have some extra ports and services enabled but that's to be expected.
I was noting the difference between yours...
interfaces: enp3s0 p3p1 virbr0 virbr0-nic
and mine
interfaces: enp2s0 vnet0 wlp4s0
Surely you must have virbr0? Not sure where virbr0-nic comes from but I assume it's created by libvirt.
virbr0's MAC is copied from the first NIC that's attached to it. To ensure that virbr0 has (1) a MAC (if no NIC's attached, it won't have a MAC) and (2) always the same MAC, virbr0-nic is created and attached to virbr0.
On Mon, 2019-01-28 at 12:36 +0100, Tom H wrote:
Surely you must have virbr0? Not sure where virbr0-nic comes from but I assume it's created by libvirt.
virbr0's MAC is copied from the first NIC that's attached to it. To ensure that virbr0 has (1) a MAC (if no NIC's attached, it won't have a MAC) and (2) always the same MAC, virbr0-nic is created and attached to virbr0.
OK, thanks.
poc
On 1/28/19 6:06 PM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 08:20 +0800, Ed Greshko wrote:
[egreshko@meimei .ssh]$ sudo firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: enp2s0 vnet0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
Nothing to remark on there I think. I have some extra ports and services enabled but that's to be expected.
I was noting the difference between yours...
interfaces: enp3s0 p3p1 virbr0 virbr0-nic
and mine
interfaces: enp2s0 vnet0 wlp4s0
Surely you must have virbr0? Not sure where virbr0-nic comes from but I assume it's created by libvirt.
Sure,
4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:9a:e8:49 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever
but, it doesn't show up in results of the firewall-cmd
[root@meimei ~]# firewall-cmd --get-active-zones public interfaces: enp2s0 wlp4s0 vnet0
It does show in the firewall-applet as a connection "virbr0 (Default Zone: public)"
Actually, vnet0, wasn't even there initially until I manually added it to "public". Originally the line read
interfaces: enp2s0 wlp4s0
I've reverted to this condition.
Have you tried with the FW stopped?
On Mon, 2019-01-28 at 19:52 +0800, Ed Greshko wrote:
Actually, vnet0, wasn't even there initially until I manually added it to "public". Originally the line read
interfaces: enp2s0 wlp4s0
What zone was it in, as a matter of interest?
I've reverted to this condition.
Have you tried with the FW stopped?
Well I stopped firewalld, though I don't think that actually stops the FW itself, i.e. iptables in the kernel.
Makes no difference.
poc
On 1/28/19 9:24 PM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 19:52 +0800, Ed Greshko wrote:
Actually, vnet0, wasn't even there initially until I manually added it to "public". Originally the line read
interfaces: enp2s0 wlp4s0
What zone was it in, as a matter of interest?
Oh, I thought I posted this earlier, public.
[egreshko@meimei ~]$ sudo firewall-cmd --info-zone=public [sudo] password for egreshko: public (active) target: default icmp-block-inversion: no interfaces: enp2s0 wlp4s0 sources: services: dhcpv6-client dns kde-connect mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
I've reverted to this condition.
Have you tried with the FW stopped?
Well I stopped firewalld, though I don't think that actually stops the FW itself, i.e. iptables in the kernel.
Makes no difference.
Well, good news and bad news. The good news is that I think I've been able to reproduce the problem. The bad news is that it is late in my day and I've had a bit of a "night cap".
But, do this, with the firewall stopped reboot the guest. Then see if it works. When I reproduced the problem it was necessary to reboot the guest to get it working.
I'll do more in my AM.
On Mon, 2019-01-28 at 21:54 +0800, Ed Greshko wrote:
Have you tried with the FW stopped?
Well I stopped firewalld, though I don't think that actually stops the FW itself, i.e. iptables in the kernel.
Makes no difference.
Well, good news and bad news. The good news is that I think I've been able to reproduce the problem. The bad news is that it is late in my day and I've had a bit of a "night cap".
But, do this, with the firewall stopped reboot the guest. Then see if it works. When I reproduced the problem it was necessary to reboot the guest to get it working.
I did that, and sure enough the guest came back up with net access. Pings worked both ways.
Then after a minute or two it went down again, without me touching the FW. This is consistent (if consistent is the word I want) with what it's been doing for the past couple of days.
I'll do more in my AM.
Thanks again.
poc
On 1/28/19 11:55 PM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 21:54 +0800, Ed Greshko wrote:
I'll do more in my AM.
Thanks again.
Well, yesterday I was able to replicate the symptoms of the problem you're having. I can't say if I actually duplicated it. However, this morning I can't determine the steps I took. The good news is that I know why I saw the same symptoms.
My setup is the Host running F29 and KDE only. Two Guests, one running F29 KDE Only and the other running F29 GNOME only.
Last night while checking, and maybe changing, things on the Host FW that pings weren't working.
Looking around I found that the F29 GOME guest had created a virbr0 interface with 192.168.122.1/24 as the address. I didn't think to check routing info on all systems. :-(
Anyway, I did find that a system with F29 installed has all the libvirt packages installed and libvirtd.service enabled. It would seem that the guests are supposed to detect they are guests and not create the bridge. FWIW, I started an F29 GNOME guest under VirtualBox and it does create the bridge.
So, maybe, try disabling libvirt.service on any guests which may have it enabled and reboot *everything* to see if your problem persists.
On Tue, 2019-01-29 at 06:11 +0800, Ed Greshko wrote:
On 1/28/19 11:55 PM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 21:54 +0800, Ed Greshko wrote:
I'll do more in my AM.
Thanks again.
Well, yesterday I was able to replicate the symptoms of the problem you're having. I can't say if I actually duplicated it. However, this morning I can't determine the steps I took. The good news is that I know why I saw the same symptoms.
My setup is the Host running F29 and KDE only. Two Guests, one running F29 KDE Only and the other running F29 GNOME only.
Last night while checking, and maybe changing, things on the Host FW that pings weren't working.
Looking around I found that the F29 GOME guest had created a virbr0 interface with 192.168.122.1/24 as the address. I didn't think to check routing info on all systems. :-(
Anyway, I did find that a system with F29 installed has all the libvirt packages installed and libvirtd.service enabled. It would seem that the guests are supposed to detect they are guests and not create the bridge. FWIW, I started an F29 GNOME guest under VirtualBox and it does create the bridge.
So, maybe, try disabling libvirt.service on any guests which may have it enabled and reboot *everything* to see if your problem persists.
Interesting, though I wouldn't expect a difference between Gnome and KDE guests. Note that my guest is Fedora Server, with no DE installed.
HOWEVER, (hold the front page!)
Last night I rebooted everything and fired up *only* the Windows guest, and it is working perfectly. Recall that I've always had two guests running, so either a) the Fedora guest is screwing things up somehow, possibly in the way you suggest, or b) libvirt is confused by having two guests. If it's either of those things then something must have changed recently, because this is exactly the setup I've been using for months with no issues, and (I stress again) I have changed nothing in my configuration other than regular dnf updates.
I'll do some more tests and report back.
poc
On 1/29/19 6:39 PM, Patrick O'Callaghan wrote:
Interesting, though I wouldn't expect a difference between Gnome and KDE guests. Note that my guest is Fedora Server, with no DE installed.
The "difference" is if you install Fedora KDE spin from the Live Media it Doesn't Install any libvirt stuff.
If you install Fedora Workstation from the Live Media it Does install ALL the libvirt stuff *and* it enables the libvirtd service. I thought that my previous message made that quite clear.
HOWEVER, (hold the front page!)
Last night I rebooted everything and fired up *only* the Windows guest, and it is working perfectly. Recall that I've always had two guests running, so either a) the Fedora guest is screwing things up somehow, possibly in the way you suggest, or b) libvirt is confused by having two guests. If it's either of those things then something must have changed recently, because this is exactly the setup I've been using for months with no issues, and (I stress again) I have changed nothing in my configuration other than regular dnf updates.
I'll do some more tests and report back.
Well, like I said, check to see if your Fedora Guests have the libvirtd service enabled. The guests don't need it. So, just disable it.
On Tue, 2019-01-29 at 10:39 +0000, Patrick O'Callaghan wrote:
So, maybe, try disabling libvirt.service on any guests which may have it enabled and
reboot *everything* to see if your problem persists.
Interesting, though I wouldn't expect a difference between Gnome and KDE guests. Note that my guest is Fedora Server, with no DE installed.
HOWEVER, (hold the front page!)
Last night I rebooted everything and fired up *only* the Windows guest, and it is working perfectly. Recall that I've always had two guests running, so either a) the Fedora guest is screwing things up somehow, possibly in the way you suggest, or b) libvirt is confused by having two guests. If it's either of those things then something must have changed recently, because this is exactly the setup I've been using for months with no issues, and (I stress again) I have changed nothing in my configuration other than regular dnf updates.
I'll do some more tests and report back.
OK, first of all the Fedora guest doesn't have libvirt.service enabled, maybe because it was installed with no DE.
Secondly, I did the following:
1) Verified that the Windows guest was still working. 2) Started the Fedora guest. 3) Both guests worked for a few minutes, then both failed. 4) Shut down the Fedora guest. Windows guest still failing. 5) Rebooted the Windows guest (from the virt-manager menu). Still failing. 6) Shut down the Windows guest and restarted it. It's now working.
I think this is a strong indication that the problem is with libvirt itself.
poc
On 1/29/19 7:02 PM, Patrick O'Callaghan wrote:
OK, first of all the Fedora guest doesn't have libvirt.service enabled, maybe because it was installed with no DE.
Secondly, I did the following:
- Verified that the Windows guest was still working.
- Started the Fedora guest.
- Both guests worked for a few minutes, then both failed.
- Shut down the Fedora guest. Windows guest still failing.
- Rebooted the Windows guest (from the virt-manager menu). Still
failing. 6) Shut down the Windows guest and restarted it. It's now working.
I think this is a strong indication that the problem is with libvirt itself.
I didn't have a Win10 guest. So, I installed. And tested with a Fedora Guest. Both are still working just fine after
[egreshko@f29g ~]$ uptime 20:16:43 up 33 min, 2 users, load average: 0.07, 0.02, 0.00
How about putting your libvirt interfaces in their own FW zone with just the basics?
On Tue, 2019-01-29 at 20:18 +0800, Ed Greshko wrote:
I didn't have a Win10 guest. So, I installed. And tested with a Fedora Guest. Both are still working just fine after
[egreshko@f29g ~]$ uptime 20:16:43 up 33 min, 2 users, load average: 0.07, 0.02, 0.00
How about putting your libvirt interfaces in their own FW zone with just the basics?
OK, did that, i.e. just moved each guest to a different zone without changing anything else. And they are now both working (I had to restart the Windows one but not the Fedora one).
If this holds up, it looks like the solution but I'm blowed if I can understand why, given that everything worked correctly without this until a few days ago.
Either way, I owe you a beer or ten, Ed. Many thanks.
poc
On Tue, 2019-01-29 at 14:59 +0000, Patrick O'Callaghan wrote:
On Tue, 2019-01-29 at 20:18 +0800, Ed Greshko wrote:
I didn't have a Win10 guest. So, I installed. And tested with a Fedora Guest. Both are still working just fine after
[egreshko@f29g ~]$ uptime 20:16:43 up 33 min, 2 users, load average: 0.07, 0.02, 0.00
How about putting your libvirt interfaces in their own FW zone with just the basics?
OK, did that, i.e. just moved each guest to a different zone without changing anything else. And they are now both working (I had to restart the Windows one but not the Fedora one).
If this holds up, it looks like the solution but I'm blowed if I can understand why, given that everything worked correctly without this until a few days ago.
Either way, I owe you a beer or ten, Ed. Many thanks.
And we're back ...
I worked away using the Windows guest for several hours. Network access kept going, though the system felt slightly sluggish at times. When I looked at the Fedora guest (which I hadn't touched in all this time) it was off-line again.
So I'm not convinced the firewall has anything to do with it after all.
poc
On 1/30/19 1:37 AM, Patrick O'Callaghan wrote:
And we're back ...
I worked away using the Windows guest for several hours. Network access kept going, though the system felt slightly sluggish at times. When I looked at the Fedora guest (which I hadn't touched in all this time) it was off-line again.
So I'm not convinced the firewall has anything to do with it after all.
If I were having this problem, I'd disable the FW, reboot everything, and see what happens.
What ever it is, it seems to be affecting few people as (granted my BZ searches are weak) I could not find any BZ that addresses this. It also seems difficult to reproduce.
I don't discount anything at this point.
On Wed, 2019-01-30 at 09:19 +0800, Ed Greshko wrote:
On 1/30/19 1:37 AM, Patrick O'Callaghan wrote:
And we're back ...
I worked away using the Windows guest for several hours. Network access kept going, though the system felt slightly sluggish at times. When I looked at the Fedora guest (which I hadn't touched in all this time) it was off-line again.
So I'm not convinced the firewall has anything to do with it after all.
If I were having this problem, I'd disable the FW, reboot everything, and see what happens.
Did that. It worked for a while. I just left both guests running for a couple of hours, logged in but not doing anything, and when I came back they were both disconnected.
What ever it is, it seems to be affecting few people as (granted my BZ searches are weak) I could not find any BZ that addresses this. It also seems difficult to reproduce.
I don't discount anything at this point.
Me neither. I want to try one more thing: leaving the Fedora guest on NAT and changing the Windows guest to macvtap (since I don't need to connect into it).
poc
On Wed, 2019-01-30 at 13:01 +0000, Patrick O'Callaghan wrote:
I want to try one more thing: leaving the Fedora guest on NAT and changing the Windows guest to macvtap (since I don't need to connect into it).
Interesting. I changed the Windows guest to macvtap and didn't touch the Fedora guest. Starting with both guests shut down, the Windows guest comes up (though it doesn't have a local IPv4 address from the host's viewpoint, as expected). However the Fedora guest - still on NAT - doesn't have an IPv4 address either and is completely disconnected.
dnsmasq (libvirts's version) is running normally. Both guests have IPv6 addresses and respond to ping6's.
I'm getting a growing feeling that something is really screwed up with my installation of libvirt. I hesitate to wimp out and reinstall it, but I'm running out of ideas.
poc
On 1/30/19 9:31 PM, Patrick O'Callaghan wrote:
I hesitate to wimp out and reinstall it, but I'm running out of ideas.
Before doing that I think I would create a couple of new F29 VM's and see if the new ones exhibit the same issue as the old ones.
On Wed, 2019-01-30 at 22:47 +0800, Ed Greshko wrote:
On 1/30/19 9:31 PM, Patrick O'Callaghan wrote:
I hesitate to wimp out and reinstall it, but I'm running out of ideas.
Before doing that I think I would create a couple of new F29 VM's and see if the new ones exhibit the same issue as the old ones.
I created a single F29 Server guest using all default settings. Running only that guest, pings work in both directions. When I boot the Windows guest, both guests are disconnected and pings from the host to either of them fail. Even when I shut down the Windows guest, they continue to fail.
Rebooting the F29 guest from within does not fix the problem. Shutting it down from the host and restarting it does fix it.
With Windows shut down: Running F28 and F29 guests fails. Running F29 and a cloned F29 fails. Running F29 and a fresh install of F29 fails.
When the Windows guest is running on its own, it works. Ditto any of the Fedora guests on its own.
In every case, failure occurs not instantly but after a few seconds, as if something is timing out. DHCP leases are an hour by default so I wondered if it could be an ARP cache problem. I rebooted both Fedora guests and ran 'arp' on each one as soon as they came up.
They both show *the same MAC address*: 52:54:00:8b:88:60, which looks at least suspicious. Then after less than a minute I ran 'arp' on each of them again, and they both returned a hardware address of '(incomplete)'.
poc
On 2/2/19 1:55 AM, Patrick O'Callaghan wrote:
They both show *the same MAC address*: 52:54:00:8b:88:60, which looks at least suspicious. Then after less than a minute I ran 'arp' on each of them again, and they both returned a hardware address of '(incomplete)'.
That is rather strange.
If you go to /etc/libvirt/qemu and grep for "mac address" on the xml files are there duplicates?
On Fri, 2019-02-01 at 17:55 +0000, Patrick O'Callaghan wrote:
They both show *the same MAC address*: 52:54:00:8b:88:60, which looks at least suspicious.
Correction: the address being shown (while arp is still working) is that of the gateway, so naturally it's the same in both guests.
poc
On 2/2/19 6:13 AM, Patrick O'Callaghan wrote:
On Fri, 2019-02-01 at 17:55 +0000, Patrick O'Callaghan wrote:
They both show *the same MAC address*: 52:54:00:8b:88:60, which looks at least suspicious.
Correction: the address being shown (while arp is still working) is that of the gateway, so naturally it's the same in both guests.
OK, I misunderstood what you were saying.
Yes, arp should return the MAC addresses of the gateway. And, incomplete does confirm that communication has been lost between the guest and the gateway.
Ahh.....can't think of what to try next.
On Sat, 2019-02-02 at 10:55 +0800, Ed Greshko wrote:
On 2/2/19 6:13 AM, Patrick O'Callaghan wrote:
On Fri, 2019-02-01 at 17:55 +0000, Patrick O'Callaghan wrote:
They both show *the same MAC address*: 52:54:00:8b:88:60, which looks at least suspicious.
Correction: the address being shown (while arp is still working) is that of the gateway, so naturally it's the same in both guests.
OK, I misunderstood what you were saying.
Yes, arp should return the MAC addresses of the gateway. And, incomplete does confirm that communication has been lost between the guest and the gateway.
Ahh.....can't think of what to try next.
Yes, I'm at a loss. Note that in all cases the IPv6 connection keeps working. I'll either BZ it or look for a libvirt list to ask on.
Thanks again Ed.
poc
On Sat, 2019-02-02 at 12:01 +0000, Patrick O'Callaghan wrote:
On Sat, 2019-02-02 at 10:55 +0800, Ed Greshko wrote:
On 2/2/19 6:13 AM, Patrick O'Callaghan wrote:
On Fri, 2019-02-01 at 17:55 +0000, Patrick O'Callaghan wrote:
They both show *the same MAC address*: 52:54:00:8b:88:60, which looks at least suspicious.
Correction: the address being shown (while arp is still working) is that of the gateway, so naturally it's the same in both guests.
OK, I misunderstood what you were saying.
Yes, arp should return the MAC addresses of the gateway. And, incomplete does confirm that communication has been lost between the guest and the gateway.
Ahh.....can't think of what to try next.
Yes, I'm at a loss. Note that in all cases the IPv6 connection keeps working. I'll either BZ it or look for a libvirt list to ask on.
Thanks again Ed.
poc
Last ditch left-field idea: I have a (commercial) VPN service which is not normally turned on but does have a systemd daemon running. I turned it off and everything started working.
I am now looking at 3 Fedora guests and a Windows guest all connected and even able to ping each other.
I think the VPN daemon was messing with the firewall. I'll have to see what to do about that but for now, it looks like this was the culprit all along. Who knew?
Apologies for wasting everyone's time, but maybe there's a lesson here somewhere ...
poc
On 2/2/19 8:22 PM, Patrick O'Callaghan wrote:
Last ditch left-field idea: I have a (commercial) VPN service which is not normally turned on but does have a systemd daemon running. I turned it off and everything started working.
I am now looking at 3 Fedora guests and a Windows guest all connected and even able to ping each other.
I think the VPN daemon was messing with the firewall. I'll have to see what to do about that but for now, it looks like this was the culprit all along. Who knew?
Apologies for wasting everyone's time, but maybe there's a lesson here somewhere ...
Well, it would be good to....
Stop firewalld, dump the IPTables, start the VPN daemon, wait a bit, and dump the IPTables again.
Also, it would be helpful to actually name the commercial VPN which may warn others about the pitfall.
But, it is good to know it is fixed. And yes, the lesson is "list things you've installed that aren't part of the normal distribution".
The other thing that I'd question would be: If you'd made no changes why then did the problem arise? Did a change in some Fedora component become "incompatible" with the VPN daemon?
Ed Greshko writes:
Well, it would be good to....
Stop firewalld, dump the IPTables, start the VPN daemon, wait a bit, and dump the IPTables again.
Also, it would be helpful to actually name the commercial VPN which may warn others about the pitfall.
Pretty sure it's Cisco Anyconnect.
$Work$ is in the process of migrating from Ubuntu 16 to Ubuntu 18. Most of the upgrades are user-initiated. For some reason a lot of people just have to be on the latest Ubuntu LTS, but Cisco's VPN client is similarly misbehaving in Ubuntu 18, for some undiagnosed reason.
Me, I'm fine on Ubuntu 16. Not my laptop, it gets the job done. When it's time to replace it, whatever the standard build IT loads, on their laptops, that's what I'll go with. My only customization is ditching Gnome, and using the XFCE desktop, instead.
On Sat, 2019-02-02 at 09:02 -0500, Sam Varshavchik wrote:
Ed Greshko writes:
Well, it would be good to....
Stop firewalld, dump the IPTables, start the VPN daemon, wait a bit, and dump the IPTables again.
Also, it would be helpful to actually name the commercial VPN which may warn others about the pitfall.
Pretty sure it's Cisco Anyconnect.
No, it's ExpressVPN.
poc
On 2/3/19 1:55 AM, Patrick O'Callaghan wrote:
On Sat, 2019-02-02 at 09:02 -0500, Sam Varshavchik wrote:
Ed Greshko writes:
Well, it would be good to....
Stop firewalld, dump the IPTables, start the VPN daemon, wait a bit, and dump the IPTables again.
Also, it would be helpful to actually name the commercial VPN which may warn others about the pitfall.
Pretty sure it's Cisco Anyconnect.
No, it's ExpressVPN.
Hummm.... They offer a 30-Day money back guarantee. Tempting.
Oh, and 24/7 Support. Thought about asking them?
On Sun, 2019-02-03 at 07:22 +0800, Ed Greshko wrote:
On 2/3/19 1:55 AM, Patrick O'Callaghan wrote:
On Sat, 2019-02-02 at 09:02 -0500, Sam Varshavchik wrote:
Ed Greshko writes:
Well, it would be good to....
Stop firewalld, dump the IPTables, start the VPN daemon, wait a bit, and dump the IPTables again.
Also, it would be helpful to actually name the commercial VPN which may warn others about the pitfall.
Pretty sure it's Cisco Anyconnect.
No, it's ExpressVPN.
Hummm.... They offer a 30-Day money back guarantee. Tempting.
I find them pretty good overall. Not the cheapest but very good performance in my experience. You get what you pay for in this area. They also rate highly in various surveys as regards security, DNS leak prevention etc.
Oh, and 24/7 Support. Thought about asking them?
I've previously asked them about split tunnelling, which they do support on other platforms but not on Linux. They said they would bear it in mind for some future date. In fact my main use-case for the Fedora guest is exactly that: I want to run a VPN within the guest while using a "normal" network connection outside it. This does work, or did until the present issue cropped up.
I may ask them about the current issue but am not very hopeful of a solution. In any case I can work around it.
poc
On 2/3/19 5:15 PM, Patrick O'Callaghan wrote:
I may ask them about the current issue but am not very hopeful of a solution. In any case I can work around it.
In reading their web site, it seems the underlying technology is OpenVPN. So, it isn't clear to me why they would need specialized procedures to start/stop the VPN.
I don't suppose they document what the systemd component is doing?
On Sun, 2019-02-03 at 17:28 +0800, Ed Greshko wrote:
On 2/3/19 5:15 PM, Patrick O'Callaghan wrote:
I may ask them about the current issue but am not very hopeful of a solution. In any case I can work around it.
In reading their web site, it seems the underlying technology is OpenVPN. So, it isn't clear to me why they would need specialized procedures to start/stop the VPN.
I don't suppose they document what the systemd component is doing?
You suppose correctly. The only documentation is a man page and everything interesting in the package is stripped binary. As one would expect. If/when I get round to testing the order-of-execution hypothesis I may talk to the support people again. They were quite responsive last time.
poc
On Sat, 2019-02-02 at 10:02 -0500, Tom Horsley wrote:
On Sat, 2 Feb 2019 21:50:55 +0800 Ed Greshko wrote:
If you'd made no changes why then did the problem arise?
There are some things man was not meant to know :-).
It's a firewall Jim, but not as we know it ...
poc
On Sat, 2019-02-02 at 21:50 +0800, Ed Greshko wrote:
On 2/2/19 8:22 PM, Patrick O'Callaghan wrote:
Last ditch left-field idea: I have a (commercial) VPN service which is not normally turned on but does have a systemd daemon running. I turned it off and everything started working.
I am now looking at 3 Fedora guests and a Windows guest all connected and even able to ping each other.
I think the VPN daemon was messing with the firewall. I'll have to see what to do about that but for now, it looks like this was the culprit all along. Who knew?
Apologies for wasting everyone's time, but maybe there's a lesson here somewhere ...
Well, it would be good to....
Stop firewalld, dump the IPTables, start the VPN daemon, wait a bit, and dump the IPTables again.
I didn't stop firewalld, but iptables with and without the VPN show no difference. Hypothesis: the problem occurs when the guests start *after* the VPN daemon is running (it comes up at boot time), but if I start the guests first then it works. I'll try and get round to testing this when I have time.
Also, it would be helpful to actually name the commercial VPN which may warn others about the pitfall.
ExpressVPN.
But, it is good to know it is fixed. And yes, the lesson is "list things you've installed that aren't part of the normal distribution".
Indeed, but for most people that would be a long list.
The other thing that I'd question would be: If you'd made no changes why then did the problem arise? Did a change in some Fedora component become "incompatible" with the VPN daemon?
Ay, there's the rub. Both libvirt-libs and the ExpressVPN rpm date from last October.
poc
On 2/2/19 4:22 AM, Patrick O'Callaghan wrote:
Last ditch left-field idea: I have a (commercial) VPN service which is not normally turned on but does have a systemd daemon running. I turned it off and everything started working.
I'll bet the vpn is messing with your routes.
On Sat, 2019-02-02 at 14:43 -0800, Mike Wright wrote:
On 2/2/19 4:22 AM, Patrick O'Callaghan wrote:
Last ditch left-field idea: I have a (commercial) VPN service which is not normally turned on but does have a systemd daemon running. I turned it off and everything started working.
I'll bet the vpn is messing with your routes.
The routing table is the same with and without the daemon running, but actually starting the VPN does change it of course:
VPN off:
$ ip route default via 192.168.1.1 dev enp3s0 proto static metric 100 192.168.1.0/24 dev enp3s0 proto kernel scope link src 192.168.1.73 metric 100 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
VPN on:
$ ip route 0.0.0.0/1 via 10.191.0.141 dev tun0 default via 192.168.1.1 dev enp3s0 proto static metric 100 10.191.0.1 via 10.191.0.141 dev tun0 10.191.0.141 dev tun0 proto kernel scope link src 10.191.0.142 128.0.0.0/1 via 10.191.0.141 dev tun0 185.230.125.203 via 192.168.1.1 dev enp3s0 192.168.1.0/24 dev enp3s0 proto kernel scope link src 192.168.1.73 metric 100 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
These are both with one guest running as well, hence the virbr0 stuff. The 185.230.125.203 address is one of the provider's endpoints.
poc
On 1/28/19 7:12 AM, Patrick O'Callaghan wrote:
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
On the host, do you get no responses when you do
arping -I virbr0 192.168.122.167 ?
[egreshko@meimei vnet0]$ arping -I virbr0 192.168.122.86 ARPING 192.168.122.86 from 192.168.122.1 virbr0 Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 0.923ms Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 1.009ms Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 0.931ms
to one of my guests
On 1/28/19 7:12 AM, Patrick O'Callaghan wrote:
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
On the host, do you get no responses when you do
arping -I virbr0 192.168.122.167 ?
[egreshko@meimei vnet0]$ arping -I virbr0 192.168.122.86 ARPING 192.168.122.86 from 192.168.122.1 virbr0 Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 0.923ms Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 1.009ms Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 0.931ms
to one of my guests
On Mon, 2019-01-28 at 08:28 +0800, Ed Greshko wrote:
On 1/28/19 7:12 AM, Patrick O'Callaghan wrote:
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
On the host, do you get no responses when you do
arping -I virbr0 192.168.122.167 ?
[egreshko@meimei vnet0]$ arping -I virbr0 192.168.122.86 ARPING 192.168.122.86 from 192.168.122.1 virbr0 Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 0.923ms Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 1.009ms Unicast reply from 192.168.122.86 [52:54:00:F3:3F:02] 0.931ms
to one of my guests
$ arping -I virbr0 192.168.122.167 # This is the Fedora guest ARPING 192.168.122.167 from 192.168.122.1 virbr0 Unicast reply from 192.168.122.167 [52:54:00:B0:20:88] 7.294ms Unicast reply from 192.168.122.167 [52:54:00:B0:20:88] 0.761ms Unicast reply from 192.168.122.167 [52:54:00:B0:20:88] 0.828ms
Oddly, the other guest (Windows) doesn't respond.
arping from the Fedora guest to the host also doesn't respond.
One more thing. Running 'arp' or 'ip neigh' on the guest gives no output. It simply sits there.
poc
On 1/28/19 7:12 AM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 06:18 +0800, Ed Greshko wrote:
If you use wireshark to monitor just vnet0 and do an ssh to the guest do you see an ARP request/response happen first? Is it correct?
[...]
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
Sorry for the fragmentation....
These are loaded, yes?
[egreshko@meimei ~]$ lsmod | grep bridge bridge 200704 0 stp 16384 1 bridge llc 16384 2 bridge,stp
On Mon, 2019-01-28 at 08:32 +0800, Ed Greshko wrote:
On 1/28/19 7:12 AM, Patrick O'Callaghan wrote:
On Mon, 2019-01-28 at 06:18 +0800, Ed Greshko wrote:
If you use wireshark to monitor just vnet0 and do an ssh to the guest do you see an ARP request/response happen first? Is it correct?
[...]
Even without trying the ssh there is a constant traffic of ARP requests with no replies:
Sorry for the fragmentation....
These are loaded, yes?
[egreshko@meimei ~]$ lsmod | grep bridge bridge 200704 0 stp 16384 1 bridge llc 16384 2 bridge,stp
Yes.
poc
On 1/26/19 6:24 AM, Patrick O'Callaghan wrote:
$ systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-01-25 21:37:32 GMT; 42min ago Docs: man:firewalld(1) Main PID: 2421 (firewalld) Tasks: 3 (limit: 4915) Memory: 28.2M CGroup: /system.slice/firewalld.service └─2421 /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid
Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --destination 192.168.122.0/24 --out-in> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --source 192.168.122.0/24 --in-interfac> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --out-interface v> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --out-interface virbr0 --jump REJECT' f> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --jump REJECT' fa> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete OUTPUT --out-interface virbr0 --protocol udp --> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de>
Also, note that you won't get those warning messages if you restart your firewall with libvirtd stopped. I discovered that when reproducing your issue.
On Mon, 2019-01-28 at 22:20 +0800, Ed Greshko wrote:
On 1/26/19 6:24 AM, Patrick O'Callaghan wrote:
$ systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-01-25 21:37:32 GMT; 42min ago Docs: man:firewalld(1) Main PID: 2421 (firewalld) Tasks: 3 (limit: 4915) Memory: 28.2M CGroup: /system.slice/firewalld.service └─2421 /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid
Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --destination 192.168.122.0/24 --out-in> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --source 192.168.122.0/24 --in-interfac> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --out-interface v> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --out-interface virbr0 --jump REJECT' f> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete FORWARD --in-interface virbr0 --jump REJECT' fa> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:32 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete OUTPUT --out-interface virbr0 --protocol udp --> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol udp --de> Jan 25 21:37:33 bree firewalld[2421]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -w --table filter --delete INPUT --in-interface virbr0 --protocol tcp --de>
Also, note that you won't get those warning messages if you restart your firewall with libvirtd stopped. I discovered that when reproducing your issue.
Presumably there won't be a virbr0 with libvirt stopped.
poc