On 26.02.2004 16:42, Jonathan Gardner wrote:
Has anyone given serious thought to changing Yum so that it uses the
bittorrent protocol to retrieve RPMs? Especially in the case of updates,
when everyone and their grandmother needs to get the RPMs right away, this
would make a lot of sense. Yum could manage a repository of RPMs and
constantly serve those up so other can download parts of them via
bittorrent, all with permission, of course.
We have pondered this solution many times here, but there are
several important drawbacks:
1. Bittorrent is highly inefficient for a large collection of small
files. You will have to start a separate tracker item for each rpm,
and for some of them the amount of traffic generated just tracking
the p2p clients will outweigh the savings of using bittorrent. I
would imagine that several thousands of tracker items would also be
quite processor-intensive.
2. You have to specifically punch holes in the firewall for
bittorrent -- not one, but a range of ports, actually. Something
most people will not do, so they will be constantly leeching.
3. Yum runs as root, so you suddenly have a very large amount of
code (yum+bittorrent libs) listening as root for incoming
connections. Yikes. Alternatively, you'd have to fork a downloader
process and communicate with it using some methods. Either way is
painful.
As you see, bittorrent is not very beneficial. However, a
bittorent-like system used by *mirrors* could be of benefit. E.g.
the client-side connects to the main server and says "I want
foo-1.0-1.i386.rpm". The server then returns:
Checksum information for foo-1.0-1.i386.rpm:
bytes 0...10000: chksum1
bytes 10000...20000: chksum2
....
bytes n-10000...n: chksum n
The following servers claim to have it:
mirror.fooland.foo
mirror.barland.bar
....
mirror.bazland.baz
Go get it yourself.
The client then connects to the mirrors and fetches the ranges
specified in the server response, thus creating a primitive swarm.
The fetching can be done via http, ftp, and file as they all support
fetching by byte range.
This would allow for auto-balancing the mirror load, though this
solution is not without its own set of difficulties:
1. This still keeps thousands of trackers on the server, though
having dedicated servers and limited tracker traffic compared to
bittorent would theoretically be easier.
2. How to keep the list of mirrors current? Should they stay
constantly connected to the main server a la bittorrent clients?
Should they use some other bittorent-like protocol for syncing with
each-other?
3. As tracker info per each package would be auto-generated, there's
no way to sign it (this would require keeping key on the server,
which is no-no). Attackers could potentially annoy a lot of people
by publishing bogus mirror data pointing to odd places. Though this
isn't really dangerous, as after all the final RPM fetched from
various servers by bits and pieces would be still cryptographically
signed.
This could be a fun project to play with, if anyone likes to mess
with things like that. :)
--
Konstantin ("Icon") Ryabitsev
Duke Physics Systems Admin, RHCE
I am looking for a job in Canada!
http://linux.duke.edu/~icon/cajob.ptml