On Wed, Nov 21, 2018 at 4:53 AM Pierre-Yves Chibon <pingou(a)pingoured.fr> wrote:
Good Morning Everyone,
I'm starting to give some thoughts to:
https://pagure.io/pagure/issue/861 which
asks for the feature to generate tarballs on commits/tags.
The code I have in mind will dedicate the generation of the tarball to the
workers as I guess it can be quite time costly for large repo.
One challenge I see with this feature is: how to prevent it from being used to
DDoS an instance?
Say, how to prevent bots/spammers from asking a tarball for every commit in the
kernel git repo? That'd fill up disk space pretty quickly and lead to DDoS for
everybody else.
Few things I have in mind:
- Do no re-generate the tarball for commits for which we already generated one
- Clean tarballs on a regular basis (say we keep them 6h, 12h or 24h)
- Prevent users from generating more than X tarballs per hours (say 3?)
Anything else you can think of that would help mitigate this potential issue?
I think you're on the right track, but a few things to keep in mind:
People will expect that if you're generating tarballs of a tag that it
follows name-gittag.(tar.gz|zip). In this case, you'll want to
maintain some kind of reference to the git commit that the tarball was
generated from in case the tag is updated to a new commit, so that the
tarballs can be expired then. Same goes for branch refs.
If a tarball has been requested of a particular commit, and then the
commit disappears from the history (rebase or whatnot), then we should
clean up cached tarballs then as well.
If neither of those cases happen, tarballs generated for tags probably
can be cached for a lot longer than branch refs or commits.
As for cleaning tarballs and prevent users from generating them, these
should ideally be configurable, but have good defaults. For example,
I'd say that cleaning tarballs every 24 hours is probably a good
default, and preventing users from generating more than 5 tarballs an
hour is probably fine. If we have a means of scaling out the tarball
generation service, then there could even be a case where more may be
allowed or something like that.
Finally, you probably will want to only allow zip and tar.gz, since
the other formats are more expensive to compress and increase the
likelihood of stalling out the system.
--
真実はいつも一つ!/ Always, there's only one truth!