As discussed previously I would like to change the crawler to crawl each category separately. The goal is to reduce the load on the database by distributing the crawling better over the whole day and to reduce the chance of mirrors being disabled because of the high database load.
This should also remove the need for mirror administrators to create multiple hosts in MirrorManager to work around the 4 hours timeout per host.
Attached is my patch. Please +1. This affects mm-crawler01 and mm-crawler02.
Adrian
+1
On 19 April 2018 at 02:54, Adrian Reber adrian@lisas.de wrote:
As discussed previously I would like to change the crawler to crawl each category separately. The goal is to reduce the load on the database by distributing the crawling better over the whole day and to reduce the chance of mirrors being disabled because of the high database load.
This should also remove the need for mirror administrators to create multiple hosts in MirrorManager to work around the 4 hours timeout per host.
Attached is my patch. Please +1. This affects mm-crawler01 and mm-crawler02.
Adrian
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
On Thu, Apr 19, 2018 at 08:54:21AM +0200, Adrian Reber wrote:
As discussed previously I would like to change the crawler to crawl each category separately. The goal is to reduce the load on the database by distributing the crawling better over the whole day and to reduce the chance of mirrors being disabled because of the high database load.
This should also remove the need for mirror administrators to create multiple hosts in MirrorManager to work around the 4 hours timeout per host.
Attached is my patch. Please +1. This affects mm-crawler01 and mm-crawler02.
Thanks for the '+1's. This change is active since yesterday and so far it seems to work. If we still see hosts timing out, especially when crawling the archive hosts, we can increase the timeout for archive crawling to 6 hours. Another option to decrease the number of wrongly auto-deactivated mirrors is to increase CRAWLER_AUTO_DISABLE from 4 to 6 or 8 crawls.
I will look at those changes after the freeze.
Adrian
On Thu, Apr 19, 2018 at 08:54:21AM +0200, Adrian Reber wrote:
As discussed previously I would like to change the crawler to crawl each category separately. The goal is to reduce the load on the database by distributing the crawling better over the whole day and to reduce the chance of mirrors being disabled because of the high database load.
This should also remove the need for mirror administrators to create multiple hosts in MirrorManager to work around the 4 hours timeout per host.
On the mirror-admin mailing list there have been reports that mirrors are no longer part of the mirrorlist/metalink.
With the switch to category based crawling a bug in the crawler was uncovered:
https://github.com/fedora-infra/mirrormanager2/issues/249
It is fixed and the crawler should now mark directories correctly during crawl.
Adrian
infrastructure@lists.fedoraproject.org