Greetings. I'm the Mirror Wrangler for the Fedora Project, and author
of the tool we use to know which mirrors are up-to-date:
MirrorManager. We have recently begun using rsync to retrieve the
list of files on a particular mirror, if you've registered rsync URLs
in MirrorManager (which you have).
For most mirrors, rsync directory listings of each of the four
Category trees you're carrying (Feodra Linux, Fedora EPEL, Fedora
Archive, and Fedora Seconary Arches) is the fastest way to get a full
list of what content your mirror has. It falls back to doing FTP DIR
listings, and then individual HTTP HEAD requests on a subset of the
files, if rsync isn't available.
In your case, the crawler takes a relatively long time to retrieve the
directory listing from your mirror. By category, I see something like this:
7/26/2013 09:00:15 PM Starting crawl
07/26/2013 09:00:15 PM scanning Category Fedora Linux
07/26/2013 10:19:29 PM rsync time: 1:16:20.570514
07/26/2013 10:25:35 PM scanning Category Fedora EPEL
07/26/2013 10:30:06 PM rsync time: 0:04:21.676678
07/26/2013 10:30:48 PM scanning Category Fedora Secondary Arches
(during which the 2-hour cumulative timeout expires, and the crawler
is killed. It never completes Fedora Secondary Arches).
This is what we're running:
rsync --temp-dir=/tmp -r --exclude=.snapshot --exclude='*.~tmp~' --no-motd
rsync://ftp.icm.edu.pl/pub/Linux/fedora/linux/
rsync --temp-dir=/tmp -r --exclude=.snapshot --exclude='*.~tmp~' --no-motd
rsync://ftp.icm.edu.pl/pub/Linux/fedora/linux/epel/
rsync --temp-dir=/tmp -r --exclude=.snapshot --exclude='*.~tmp~' --no-motd
rsync://ftp.icm.edu.pl/pub/Linux/dist/fedora-secondary/
Now, for other mirrors serving rsync such as
mirrors.us.kernel.org
shown here, we see times such as:
07/27/2013 12:29:05 AM Starting crawl
07/27/2013 12:29:05 AM scanning Category Fedora Linux
07/27/2013 12:31:26 AM rsync time: 0:01:14.965211
07/27/2013 12:34:21 AM scanning Category Fedora EPEL
07/27/2013 12:34:41 AM rsync time: 0:00:12.548085
07/27/2013 12:34:59 AM scanning Category Fedora Secondary Arches
07/27/2013 12:40:12 AM rsync time: 0:02:49.361520
07/27/2013 12:47:02 AM scanning Category Fedora Other
07/27/2013 12:47:13 AM rsync time: 0:00:06.161032
07/27/2013 12:57:27 AM Total directories: 5805
07/27/2013 12:57:27 AM Changed to up2date: 0
07/27/2013 12:57:27 AM Changed to not up2date: 0
07/27/2013 12:57:27 AM Unchanged: 5805
07/27/2013 12:57:27 AM Unknown disposition: 0
07/27/2013 12:57:27 AM New HostCategoryDirs created: 87
07/27/2013 12:57:27 AM HostCategoryDirs now deleted on the master, marked not up2date: 0
07/27/2013 12:57:27 AM Ending crawl
The whole process takes under 30 minutes. By relative difference,
your mirror is taking 60x more time to serve the same directory list as does
other mirrors. I would have to increase the timeout to crawl your
mirror from 2 hours to 30 hours, by which point the content would have
changed yet again, several times...
I raise this because I know you do a good job running your mirror
generally, so this seems anomalous.
In the past, mirror admins have suggested reducing the value of
/proc/sys/vm/vfs_cache_pressure, from the default value 100, to a
lower number, causing the kernel to prefer to keep dentries when under
memory pressure:
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
vfs_cache_pressure
------------------
Controls the tendency of the kernel to reclaim the memory which is
used for caching of directory and inode objects.
At the default value of vfs_cache_pressure=100 the kernel will attempt
to reclaim dentries and inodes at a "fair" rate with respect to
pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
the kernel to prefer to retain dentry and inode caches. When
vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to
out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.
Please take a look and see if a change is warranted on your side, or
if you see different behaviour for rsync directory listings than I am.
Thank you for being a long-standing Fedora mirror. I still had
reference to the old sunsite name on our lists from before I took over
the role as Mirror Wrangler back in 2006. We appreciate your
commitment to Fedora.
Thanks,
Matt