On Wed, 2018-11-21 at 14:36 +0100, Kamil Paral wrote:
On Fri, Nov 16, 2018 at 11:13 PM Jonathan Dieter
<jdieter(a)gmail.com> wrote:
> For reference, this is in reply to Paul's email about lifecycle
> objectives, specifically focusing on problem statement #1[1].
>
> <tl;dr>
> Have rpm use zchunk as its compression format, removing the need for
> deltarpms, and thus reducing compose time. This will require changes
> to both the rpm format and new features in the zchunk format.
> </tl;dr>
Hey Jonathan,
thanks for working on this. The proposed changes sound good to me.
I'm a bit worried that zchunk is not yet a proven format, so it might
be a good idea to use it for metadata first, see whether it works as
expected, and then push it for RPM files. But that's for more
technical people to judge.
I have some concrete questions, though:
1. I have noticed that especially with large RPMs (firefox, chrome,
atom, game data like 0ad-data, etc), my PCs are mostly bottlenecked
by CPU when installing them. And that's with a modern 3.5+GHz CPU.
That's because RPM decompression runs in a single thread only, and xz
is just unbelievably slow. I wonder, would zchunk used as an RPM
compression algorithm improve this substantially? Can it decompress
in multiple threads and/or does it have much faster decompression
speeds (and how much)? I don't care about RPM size increase, but I'd
really like to have them installed fast. (That's of course just my
personal preference, but this also affects the speed of mock builds
and such, so I think it's relevant.)
The zstd compression that zchunk uses internally is designed to be
faster than even gzip at decompression. Currently zchunk is single-
threaded, but, given that each chunk is independent, making it multi-
threaded should be pretty trivial, and is on the todo list.
2. In our past QA efforts in Fedora, we had use cases for retrieving
rpm header data without retrieving the actual content (the payload).
That was for cases when we needed to check e.g. dependency issues,
but the rpms were not placed in a repository yet (i.e. no easy access
to their metadata) and it was slow and wasteful to download the whole
rpm just to get the header. Will the new zchunk compression still
make it possible to retrieve just the header without accessing all
the payload data? (It would be great to make this accessible from
Python and not just C, but that's a plea I should direct to rpm
maintainers, I guess).
The zchunk format supports the concept of multiple independent streams
in a single file. A zchunk rpm would contain two streams, the rpm
header and the rpm payload. Since downloading a zchunk file is two
steps already (downloading the zchunk header, and then downloading the
required chunks), it should be easy enough to download only the chunks
needed for the rpm header stream.
As for a python API, I would love for zchunk to have that too, but
haven't had the time yet.
I hope that helps.
Jonathan