Re: [Rpm-metadata] Re: Better repodata performance

Sunday, 30 January 2005

On Jan 31, 2005, Jeff Pitman <symbiont(a)berlios.de&gt; wrote:

...
 This could be driven by an optional parameter to createrepo, which 
 provides a list of packages to create a delta with. 
Err...  Why?  We already have repodata/, and we're creating the new
version in .repodata.  We can use repodata/ however we like, I think.

...
 If it were fully automatic, it would only be a download win for the
 user. 
And the servers.

...
 I would rather not utilize xdelta, because you're still
regenerating the 
 entire thing.  Having xmlets that virtually add/substract as a delta 
 against primary.xml.gz would be optimal for both sides of the equation.  
But then Seth rejects the idea because it makes for unmaintainable
code.  And I sort of agree with him now that I see a simpler way to
accomplish the same bandwidth savings.

...
 Another advantage of the delta method, is that the on-disk pickled 
 objects (or whatever back-end store is used) could be updated 
 incrementally based on xml snippets coming in. Instead of regenerating 
 the whole thing over again. 
This is certainly a good point, but it is also trickier to get right.
And it might also turn out to be bigger: if you have to list what went
away, you're probably emitting more information than xdelta's `skip
these many bytes'.  It's like comparing diff with xdelta: diff is
reversible because it contains what was removed and what as added
(plus optional context), whereas xdelta only contains what was
inserted and what portions of the original remained.

Getting inserts small is trivial; getting removals small might be
trickier, and to take advantage of pickling we need the latter.

Unless...  Does anyone feel like implementing an xml-aware xdelta-like
program in Python? :-)

-- 
Alexandre Oliva             http://www.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva(a){redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva(a){lsd.ic.unicamp.br, gnu.org}

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Rpm-metadata] Re: Better repodata performance