On Jan 31, 2005, Jeff Pitman <symbiont(a)berlios.de> wrote:
This could be driven by an optional parameter to createrepo, which
provides a list of packages to create a delta with.
Err... Why? We already have repodata/, and we're creating the new
version in .repodata. We can use repodata/ however we like, I think.
If it were fully automatic, it would only be a download win for the
user.
And the servers.
I would rather not utilize xdelta, because you're still
regenerating the
entire thing. Having xmlets that virtually add/substract as a delta
against primary.xml.gz would be optimal for both sides of the equation.
But then Seth rejects the idea because it makes for unmaintainable
code. And I sort of agree with him now that I see a simpler way to
accomplish the same bandwidth savings.
Another advantage of the delta method, is that the on-disk pickled
objects (or whatever back-end store is used) could be updated
incrementally based on xml snippets coming in. Instead of regenerating
the whole thing over again.
This is certainly a good point, but it is also trickier to get right.
And it might also turn out to be bigger: if you have to list what went
away, you're probably emitting more information than xdelta's `skip
these many bytes'. It's like comparing diff with xdelta: diff is
reversible because it contains what was removed and what as added
(plus optional context), whereas xdelta only contains what was
inserted and what portions of the original remained.
Getting inserts small is trivial; getting removals small might be
trickier, and to take advantage of pickling we need the latter.
Unless... Does anyone feel like implementing an xml-aware xdelta-like
program in Python? :-)
--
Alexandre Oliva
http://www.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer aoliva(a){redhat.com, gcc.gnu.org}
Free Software Evangelist oliva(a){lsd.ic.unicamp.br, gnu.org}