Hi Matt,
Sorry for the slow reply. I just wanted to let you know that this has been
looked at, though no changes have been made to the server as of yet.
The server that it's hosted on is running FreeBSD, which doesn't seem to
have vfs_cache_pressure as a kernel setting. There are other settings
relating to vfs cache, so I've been looking into which ones might help
make this better.
I may just end up moving the Fedora mirror to a different server though :)
-Dan
On Fri, 26 Jul 2013, Matt Domsch wrote:
Greetings Dan. I'm the Mirror Wrangler for the Fedora Project,
and
author of the tool we use to know which mirrors are up-to-date:
MirrorManager.
For most mirrors, rsync directory listings of each of the two Category
trees you're carrying (Feodra Linux, Fedora EPEL) is the fastest way
to get a full list of what content your mirror has. It falls back to
doing individual HTTP HEAD requests on a subset of the files if rsync
isn't available, and FTP DIR calls if HTTP isn't available.
In your case, as you don't have an rsync target, the crawler takes a
relatively long time to make all the HTTP HEAD calls to your mirror,
over 2 hours. Nearly all other mirrors that likewise don't have rsync,
take well under 2 hours.
Now, for other mirrors serving rsync such as
mirrors.us.kernel.org
shown here, we see times such as:
07/27/2013 12:29:05 AM Starting crawl
07/27/2013 12:29:05 AM scanning Category Fedora Linux
07/27/2013 12:31:26 AM rsync time: 0:01:14.965211
07/27/2013 12:34:21 AM scanning Category Fedora EPEL
07/27/2013 12:34:41 AM rsync time: 0:00:12.548085
07/27/2013 12:34:59 AM scanning Category Fedora Secondary Arches
07/27/2013 12:40:12 AM rsync time: 0:02:49.361520
07/27/2013 12:47:02 AM scanning Category Fedora Other
07/27/2013 12:47:13 AM rsync time: 0:00:06.161032
07/27/2013 12:57:27 AM Total directories: 5805
07/27/2013 12:57:27 AM Changed to up2date: 0
07/27/2013 12:57:27 AM Changed to not up2date: 0
07/27/2013 12:57:27 AM Unchanged: 5805
07/27/2013 12:57:27 AM Unknown disposition: 0
07/27/2013 12:57:27 AM New HostCategoryDirs created: 87
07/27/2013 12:57:27 AM HostCategoryDirs now deleted on the master, marked not up2date: 0
07/27/2013 12:57:27 AM Ending crawl
The whole process takes under 30 minutes.
I raise this because I know you do a good job running your mirror
generally, so this seems anomalous. I also know you have HTTP
KeepAlives turned on, I can see those results in the crawler debug
logs. It seems each HTTP HEAD request takes a second or more, which
when doing hundreds of such across all the directories in your
complete mirror, adds up.
In the past, mirror admins have suggested reducing the value of
/proc/sys/vm/vfs_cache_pressure, from the default value 100, to a
lower number, causing the kernel to prefer to keep dentries when under
memory pressure:
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
vfs_cache_pressure
------------------
Controls the tendency of the kernel to reclaim the memory which is
used for caching of directory and inode objects.
At the default value of vfs_cache_pressure=100 the kernel will attempt
to reclaim dentries and inodes at a "fair" rate with respect to
pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
the kernel to prefer to retain dentry and inode caches. When
vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to
out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.
Please take a look and see if a change is warranted on your side, or
if you see different behaviour for HTTP HEAD calls than I am.
Thanks,
Matt