Well I've come up with a multi-pronged solution, after much
experimentation, that keeps load in the single digits throughout the entire
certmonger startup process.
First, I've learned more about zram swap, namely that the size
specification is not the physical ram used but the virtual swap size
created. From observation I've found a ~2.8:1 savings in memory between the
compression ratio of pages and duplicate pages not being stored multiple
times, when running FreeIPA and certmonger is consuming memory with forked
processes. So while the swap usage peaked at ~1.3GB the physical memory
usage of the swap was only ~462MB. This is important because it means I can
use more zram swap and avoid using a swapfile on the SD card entirely.
However, Fedora's zram swap configuration method by default doesn't allow
you to configure a swap size larger than physical memory as it's expecting
you to provide a factor X which it then uses to allocate 1/X memory to zram
swap, and due to it using BASH scripting math you can't specify decimals
(i.e. 0.5). So I copied the zram startupt script to /opt and changed it to
use a different config parameter that directly specifies memory size, and
used 'systemctl edit zram-swap.service' to override the ShellExec for it to
use the modified script, allowing me to allocate a 2GB zram swap.
'systemctl edit zram-swap' :
[Service]
ExecStart=
ExecStart=/opt/zramstart
Diff of /opt/zramstart
# diff -bB /usr/sbin/zramstart /opt/zramstart
14a15
[ -z "$SIZE" ] || zram_size=$SIZE
Second, I used 'systemctl edit certmonger.service' to modify Certmonger's
service file to specify CPUQuota to prevent it from clobbering all the
other normal processes when it fork bombs:
[Service]
CPUQuota=20%
Third, I disabled the certmonger service so it doesn't auto-start at boot
and instead created a systemd timer certmonger.timer that starts it after 5
minutes after boot to allow everything else to start up first before it
gets hammered:
[Unit]
Description=Run certmonger after boot settles down
[Timer]
OnBootSec=5min
[Install]
WantedBy=timers.target
All of these changes *should* survive any system updates as well since no
systemd or similar files were edited directly, so that's an added bonus of
not having to remember to re-tweak things after an update.
With all of the above changes, I'm able to boot, FreeIPA services all start
as normal (except certmonger), then a few minutes later certmonger starts,
and load never goes above 10, mostly around 5, until certmonger's forked
processes all finish up finally. It takes about an hour, but that's ~2x
faster than letting it try to complete with no CPU Quota (even with the
modified zram swap - and without it, it simply runs out of memory if I
don't have additional swapfile which kills performance even more) as load
gets over 40 in short order and the system becomes mostly unresponsive.
Even during certmonger startup, DNS/LDAP/etc are responsive and thus the Pi
is usable for our purposes as a local replica to ensure that offices that
lack a full fat FreeIPA installation on real server hardware won't become
useless if their VPN connection to a site that does have a full
installation goes down. Ensuring local redundancy so regular that work can
continue as normal if there's a network outage is the goal of using the Pi
after all, and thus the desired result has been achieved after some
tinkering.
On Thu, May 16, 2019 at 5:37 PM Jonathan Vaughn <jonathan(a)creatuity.com>
wrote:
The many certmonger processes exceed the available RAM (Pi 3 having
1GB)
by a wide margin and cause heavy swapping as they all try to run at once,
and the heavy swapping itself is the reason load gets so high. If it was
one at a time they might still encounter some swapping (or might not, but
it should be doable with just zram swap instead of needing physical swap,
which would mean minimal load hit). I don't know if they wait on a lock at
some point, but they're definitely all kicking off at nearly the same time
and even if they end up pausing when they reach a certain point the system
spends a long time swapping constantly trying to load all of the processes
into memory at once.
I haven't timed it but it takes at least double digit minutes for load to
recover from 30+ to a "normal" load of less than 5 (at idle, with the other
non-CA FreeIPA services running, and minimal activity, load is around 4 +-
a bit).
If we can't find a solution to tame certmonger's behavior I am considering
just scheduling certmonger to run once a day or week or whatever at a
preset time outside the normal operating hours for the office that the Pi
happens to be located in, which would at least reduce the impact to just
being very annoying.
On Wed, May 15, 2019 at 9:00 PM Fraser Tweedale <ftweedal(a)redhat.com>
wrote:
> On Wed, May 15, 2019 at 05:15:38PM -0400, Rob Crittenden via
> FreeIPA-users wrote:
> > Jonathan Vaughn via FreeIPA-users wrote:
> > > I previously had tested FreeIPA running on a Raspberry Pi 3B+ and as
> > > long as I didn't run the Dogtag server on it performance seemed
> > > acceptable for the purpose. These are only being used as local
> > > DNS/LDAP/Krb5 replicas, everything also runs on both physical x86_64
> and
> > > VM x86_64 servers as well in more than one location.
> >
> > It is STRONGLY not recommended to run IPA in production on *Pi. If you
> > have you and your wife on some local LAN then maybe.
> >
> > > However now that I'm trying to set up Pis for actual use (previously
> had
> > > set up a test environment to validate using them) I'm running into
> major
> > > performance issues once certmonger starts. Using a systemd timer to
> > > delay start until everything else starts at least lets everything else
> > > FreeIPA related start up and work, but once certmonger starts it still
> > > hammers the system using tons of memory and causing lots of swapping.
> > >
> > > Is there any reason for it to spawn so many processes all at once,
> > > versus doing them in a more serial fashion? And did something change
> in
> > > FreeIPA/certmonger behavior in the last year that would cause such a
> > > performance regression in memory limited scenarios? Previously I just
> > > had zram swap and it was fine, now I have to replace that with actual
> > > swap on storage.
> >
> > Hard to say since you include no version information.
> >
> > > Also, there's currently no certs needing renewal or anything on this
> > > system, so why does it even spawn so many processes ?
> > >
> > > root 1699 1 0 03:55 ? 00:00:00 /usr/sbin/certmonger
> -S
> > > -p /var/run/certmonger.pid -n
> > > root 1720 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1721 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1722 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1723 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1724 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1725 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1726 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1727 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/ipa-server-guard
> /usr/libexec/certmonger/ipa-submit
> > > root 1742 1699 0 03:55 ? 00:00:00
> > > /usr/libexec/certmonger/dogtag-ipa-renew-agent-submit
> > > root 1759 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1761 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1762 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1763 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1764 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1765 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1767 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1768 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > > root 1769 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1770 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1771 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1772 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1773 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1774 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1775 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > > root 1776 1699 0 03:57 ? 00:00:00 /usr/bin/python3 -E
> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> --reuse-existing
> > >
> > > Eventually these complete and things settle down but it takes a very
> > > long time, and without delaying certmonger until after the rest of
> > > FreeIPA it can cause various IPA services to take so long that they
> die
> > > and fail to start.
> >
> > On startup certmonger examines all the certs to see if, for example, the
> > roots have changed. There are all the processes because there is one per
> > tracked cert I assume. There is serialization in the IPA certmonger
> > config (ipa-server-guard) so they go one at at time.
> >
> Do they busy-wait on the lock? Maybe that is why the load is so
> high?
>
> I echo Rob's comments about Raspberry Pi. For sure there is room to
> improve performance, but a future where FreeIPA runs well on such
> low-spec machines... it is hard to imagine, and not something we're
> aiming for.
>
> Thanks,
> Fraser
>
>
> > rob
> > _______________________________________________
> > FreeIPA-users mailing list -- freeipa-users(a)lists.fedorahosted.org
> > To unsubscribe send an email to
> freeipa-users-leave(a)lists.fedorahosted.org
> > Fedora Code of Conduct:
https://getfedora.org/code-of-conduct.html
> > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives:
>
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedoraho...
>