Well I've come up with a multi-pronged solution, after much experimentation, that keeps load in the single digits throughout the entire certmonger startup process.

First, I've learned more about zram swap, namely that the size specification is not the physical ram used but the virtual swap size created. From observation I've found a ~2.8:1 savings in memory between the compression ratio of pages and duplicate pages not being stored multiple times, when running FreeIPA and certmonger is consuming memory with forked processes. So while the swap usage peaked at ~1.3GB the physical memory usage of the swap was only ~462MB. This is important because it means I can use more zram swap and avoid using a swapfile on the SD card entirely. However, Fedora's zram swap configuration method by default doesn't allow you to configure a swap size larger than physical memory as it's expecting you to provide a factor X which it then uses to allocate 1/X memory to zram swap, and due to it using BASH scripting math you can't specify decimals (i.e. 0.5). So I copied the zram startupt script to /opt and changed it to use a different config parameter that directly specifies memory size, and used 'systemctl edit zram-swap.service' to override the ShellExec for it to use the modified script, allowing me to allocate a 2GB zram swap.

'systemctl edit zram-swap' :
[Service]
ExecStart=
ExecStart=/opt/zramstart

Diff of /opt/zramstart
# diff -bB /usr/sbin/zramstart /opt/zramstart
14a15
> [ -z "$SIZE" ] || zram_size=$SIZE


Second, I used 'systemctl edit certmonger.service' to modify Certmonger's service file to specify CPUQuota to prevent it from clobbering all the other normal processes when it fork bombs:
[Service]
CPUQuota=20%

Third, I disabled the certmonger service so it doesn't auto-start at boot and instead created a systemd timer certmonger.timer that starts it after 5 minutes after boot to allow everything else to start up first before it gets hammered:
[Unit]
Description=Run certmonger after boot settles down

[Timer]
OnBootSec=5min

[Install]
WantedBy=timers.target

All of these changes *should* survive any system updates as well since no systemd or similar files were edited directly, so that's an added bonus of not having to remember to re-tweak things after an update.

With all of the above changes, I'm able to boot, FreeIPA services all start as normal (except certmonger), then a few minutes later certmonger starts, and load never goes above 10, mostly around 5, until certmonger's forked processes all finish up finally. It takes about an hour, but that's ~2x faster than letting it try to complete with no CPU Quota (even with the modified zram swap - and without it, it simply runs out of memory if I don't have additional swapfile which kills performance even more) as load gets over 40 in short order and the system becomes mostly unresponsive.

Even during certmonger startup, DNS/LDAP/etc are responsive and thus the Pi is usable for our purposes as a local replica to ensure that offices that lack a full fat FreeIPA installation on real server hardware won't become useless if their VPN connection to a site that does have a full installation goes down. Ensuring local redundancy so regular that work can continue as normal if there's a network outage is the goal of using the Pi after all, and thus the desired result has been achieved after some tinkering. 

On Thu, May 16, 2019 at 5:37 PM Jonathan Vaughn <jonathan@creatuity.com> wrote:
The many certmonger processes exceed the available RAM (Pi 3 having 1GB) by a wide margin and cause heavy swapping as they all try to run at once, and the heavy swapping itself is the reason load gets so high. If it was one at a time they might still encounter some swapping (or might not, but it should be doable with just zram swap instead of needing physical swap, which would mean minimal load hit). I don't know if they wait on a lock at some point, but they're definitely all kicking off at nearly the same time and even if they end up pausing when they reach a certain point the system spends a long time swapping constantly trying to load all of the processes into memory at once. 

I haven't timed it but it takes at least double digit minutes for load to recover from 30+ to a "normal" load of less than 5 (at idle, with the other non-CA FreeIPA services running, and minimal activity, load is around 4 +- a bit). 

If we can't find a solution to tame certmonger's behavior I am considering just scheduling certmonger to run once a day or week or whatever at a preset time outside the normal operating hours for the office that the Pi happens to be located in, which would at least reduce the impact to just being very annoying.

On Wed, May 15, 2019 at 9:00 PM Fraser Tweedale <ftweedal@redhat.com> wrote:
On Wed, May 15, 2019 at 05:15:38PM -0400, Rob Crittenden via FreeIPA-users wrote:
> Jonathan Vaughn via FreeIPA-users wrote:
> > I previously had tested FreeIPA running on a Raspberry Pi 3B+ and as
> > long as I didn't run the Dogtag server on it performance seemed
> > acceptable for the purpose. These are only being used as local
> > DNS/LDAP/Krb5 replicas, everything also runs on both physical x86_64 and
> > VM x86_64 servers as well in more than one location.
>
> It is STRONGLY not recommended to run IPA in production on *Pi. If you
> have you and your wife on some local LAN then maybe.
>
> > However now that I'm trying to set up Pis for actual use (previously had
> > set up a test environment to validate using them) I'm running into major
> > performance issues once certmonger starts. Using a systemd timer to
> > delay start until everything else starts at least lets everything else
> > FreeIPA related start up and work, but once certmonger starts it still
> > hammers the system using tons of memory and causing lots of swapping.
> >
> > Is there any reason for it to spawn so many processes all at once,
> > versus doing them in a more serial fashion? And did something change in
> > FreeIPA/certmonger behavior in the last year that would cause such a
> > performance regression in memory limited scenarios? Previously I just
> > had zram swap and it was fine, now I have to replace that with actual
> > swap on storage.
>
> Hard to say since you include no version information.
>
> > Also, there's currently no certs needing renewal or anything on this
> > system, so why does it even spawn so many processes ? 
> >
> > root      1699     1  0 03:55 ?        00:00:00 /usr/sbin/certmonger -S
> > -p /var/run/certmonger.pid -n
> > root      1720  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1721  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1722  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1723  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1724  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1725  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1726  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1727  1699  0 03:55 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/ipa-server-guard /usr/libexec/certmonger/ipa-submit
> > root      1742  1699  0 03:55 ?        00:00:00
> > /usr/libexec/certmonger/dogtag-ipa-renew-agent-submit
> > root      1759  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1761  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1762  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1763  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1764  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1765  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1767  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1768  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit
> > root      1769  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1770  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1771  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1772  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1773  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1774  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1775  1699  0 03:56 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> > root      1776  1699  0 03:57 ?        00:00:00 /usr/bin/python3 -E
> > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit --reuse-existing
> >
> > Eventually these complete and things settle down but it takes a very
> > long time, and without delaying certmonger until after the rest of
> > FreeIPA it can cause various IPA services to take so long that they die
> > and fail to start.
>
> On startup certmonger examines all the certs to see if, for example, the
> roots have changed. There are all the processes because there is one per
> tracked cert I assume. There is serialization in the IPA certmonger
> config (ipa-server-guard) so they go one at at time.
>
Do they busy-wait on the lock?  Maybe that is why the load is so
high?

I echo Rob's comments about Raspberry Pi.  For sure there is room to
improve performance, but a future where FreeIPA runs well on such
low-spec machines... it is hard to imagine, and not something we're
aiming for.

Thanks,
Fraser


> rob
> _______________________________________________
> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
> To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org