During the last months most mirror problems have been related to report_mirror reporting that mirrors are up to date and the crawler then marking the mirror as not up to date. Which resulted in mirrors being added and removed from the mirrorlist every few hours.
As the crawler normally sees what the user sees, means that report_mirror often reports the wrong mirror status.
I would like to ignore report_mirror results for non-private mirrors in the future. For private mirrors report_mirror is the only way to know which content it has, for public mirrors it is not really necessary as the mirrors are crawled anyway and report_mirror does not contain enough information anyway right now.
Any objections to ignore report_mirror reports for public mirrors after the F29 release.
Adrian
On 09/26/2018 10:53 AM, Adrian Reber wrote:
During the last months most mirror problems have been related to report_mirror reporting that mirrors are up to date and the crawler then marking the mirror as not up to date. Which resulted in mirrors being added and removed from the mirrorlist every few hours.
As the crawler normally sees what the user sees, means that report_mirror often reports the wrong mirror status.
I would like to ignore report_mirror results for non-private mirrors in the future. For private mirrors report_mirror is the only way to know which content it has, for public mirrors it is not really necessary as the mirrors are crawled anyway and report_mirror does not contain enough information anyway right now.
Any objections to ignore report_mirror reports for public mirrors after the F29 release.
Seems reasonable to me. Perhaps we could output this info when such a mirror reports in? ie, "you are a public mirror and we are ignoring report_mirror data for you, please see https://... for details"
kevin
"AR" == Adrian Reber adrian@lisas.de writes:
AR> I would like to ignore report_mirror results for non-private mirrors AR> in the future.
I don't really see why this would have to be done globally unless we're just going to abandon the concept of reporting for public mirrors. Do you have an issue with quick-fedora-mirror checkins as well or just ones from report_mirror? Can I have quick-fedora-mirror provide additional data so that you can distinguish it from report_mirror?
And if checkins are ignored, then will mirrormanager have to wait until the next crawl before it knows that my mirrors have new content (even though that happens within a few minutes after the content is available)?
AR> For private mirrors report_mirror is the only way to AR> know which content it has, for public mirrors it is not really AR> necessary as the mirrors are crawled anyway and report_mirror does AR> not contain enough information anyway right now.
Well, what information do you need?
- J<
On Fri, Sep 28, 2018 at 11:42:02AM -0500, Jason L Tibbitts III wrote:
"AR" == Adrian Reber adrian@lisas.de writes:
AR> I would like to ignore report_mirror results for non-private mirrors AR> in the future.
I don't really see why this would have to be done globally unless we're just going to abandon the concept of reporting for public mirrors. Do you have an issue with quick-fedora-mirror checkins as well or just ones from report_mirror? Can I have quick-fedora-mirror provide additional data so that you can distinguish it from report_mirror?
The problem is that people are reporting different things than they have. Sometimes they are reporting from different machines than the ones we scan. Sometimes it is some kind of cluster which is out of sync.
This has nothing to do with report_mirror or quick-fedora-mirror.
And if checkins are ignored, then will mirrormanager have to wait until the next crawl before it knows that my mirrors have new content (even though that happens within a few minutes after the content is available)?
Which is not really a problem for most cases, as we are re-directing to older mirrors anyway. That works as most of the times we are referencing up to three repomd.xml files in the metalink.
AR> For private mirrors report_mirror is the only way to AR> know which content it has, for public mirrors it is not really AR> necessary as the mirrors are crawled anyway and report_mirror does AR> not contain enough information anyway right now.
Well, what information do you need?
Everything the crawler has. I really like the idea behind report_mirror but is does not work with what we have right now. The crawler is anyway the instance which decides about the actual state of a mirror and right now report_mirror is the main problem I am seeing with broken mirrors.
Adrian
"AR" == Adrian Reber adrian@lisas.de writes:
AR> The problem is that people are reporting different things than they AR> have.
Well quick-fedora-mirror shouldn't be doing that, since it checks in what it mirrors as part of a run. Though I suppose you could incorrectly set the host it reports as.
AR> This has nothing to do with report_mirror or quick-fedora-mirror.
Well it does considering that you would then be ignoring the data those things provide (which does take some effort to produce).
AR> Everything the crawler has.
Which is less than what quick-fedora-mirror (or report_mirror) have access to, and it's also less timely. And then there's the difficulty in getting that data out of a mirror that only provides http. I'd think checkins would be the preferred way and the crawler would be a backup check that the mirror is configured correctly.
Anyway, my point is that if the data are insufficient then it's possible to fix that. If q-f-m checkins are generally less broken than report_mirror checkins, then we can make it easy to identify them. But sure, if the data simply aren't useful at all, then there is no point in sending them.
- J<
infrastructure@lists.fedoraproject.org