Hello,
Related to recent space saving discussions, I came across PLD's rpm-build-macros package recently, and found that they hardlink identical *.pyc and *.pyo. In a lot of cases, they're the same, and there's some potential for saving some MB "for free", on my FC6 x86_64 box:
$ /usr/sbin/hardlink -ncv /usr/lib*/python2.4 2>&1 | tail -n 1 Would save 11116544
The PLD implementation looks like this:
# Hardlink binary identical .pyc and .pyo files # (idea by glen <at> pld-linux <dot> org) %__spec_install_post_py_hardlink {\ %{!?no_install_post_py_hardlink: __spec_install_post_py_hardlink() { \ [ ! -d "$RPM_BUILD_ROOT" ] || find "$RPM_BUILD_ROOT" -name '*.pyc' | while read a; do \ b="$(echo $a|sed -e 's/.pyc$/.pyo/')"; \ if cmp -s "$a" "$b"; then \ ln -f "$a" "$b"; \ fi; \ done \ }; __spec_install_post_py_hardlink } }
The use of "cmp" would require diffutils installed. Or the above could be converted to use hardlink instead (which would have to be made sure to be around) or maybe sha1sum (in coreutils, pretty much always around in buildroots).
I suppose something like the above could be easily added to redhat-rpm-config or rpm, eg. embedded in brp-python-bytecompile or run after it in %__os_install_post.
Worth it? Other comments?
On 03.04.2007 12:42, Ville Skyttä wrote:
Worth it? Other comments?
Hehe, sounds interesting. Did you tell the Live-CD guys about it? I'd say they will probably very interested in something like this (they could run hardlink directly for now until we decide what we want to do).
CU thl
On Tuesday 03 April 2007, Thorsten Leemhuis wrote:
On 03.04.2007 12:42, Ville Skyttä wrote:
Worth it? Other comments?
Hehe, sounds interesting. Did you tell the Live-CD guys about it? I'd say they will probably very interested in something like this (they could run hardlink directly for now until we decide what we want to do).
Nope, just brought it up in last week's packaging meeting and now posted here. Feel free to forward.
Another thing they could be interested in (again on my FC6 x86_64): $ /usr/sbin/hardlink -ncv /usr/share/doc 2>&1 | tail -n 1 Would save 6692864 (of which COPYING's share is about 3.2M)
On Tue, Apr 03, 2007 at 01:42:24PM +0300, Ville Skyttä wrote:
Related to recent space saving discussions, I came across PLD's rpm-build-macros package recently, and found that they hardlink identical *.pyc and *.pyo. In a lot of cases, they're the same, and there's some potential for saving some MB "for free", on my FC6 x86_64 box:
$ /usr/sbin/hardlink -ncv /usr/lib*/python2.4 2>&1 | tail -n 1 Would save 11116544
I get more than twice as much on a typical FC6/x86_64 system: 27275264. That's 26 MB on 166MB total, e.g. saving 16%.
# du -sc /usr/lib*/python2.4| tail -n 1 170200 total
On another system I get 21MB of 144MB total, e.g about 15%.
The PLD implementation looks like this:
# Hardlink binary identical .pyc and .pyo files # (idea by glen <at> pld-linux <dot> org) %__spec_install_post_py_hardlink {\ %{!?no_install_post_py_hardlink: __spec_install_post_py_hardlink() { \ [ ! -d "$RPM_BUILD_ROOT" ] || find "$RPM_BUILD_ROOT" -name '*.pyc' | while read a; do \ b="$(echo $a|sed -e 's/.pyc$/.pyo/')"; \ if cmp -s "$a" "$b"; then \ ln -f "$a" "$b"; \ fi; \ done \ }; __spec_install_post_py_hardlink } }
The use of "cmp" would require diffutils installed. Or the above could be converted to use hardlink instead (which would have to be made sure to be around) or maybe sha1sum (in coreutils, pretty much always around in buildroots).
I suppose something like the above could be easily added to redhat-rpm-config or rpm, eg. embedded in brp-python-bytecompile or run after it in %__os_install_post.
brp-python-bytecompile sounds like the best spot since the pyc/pyos are created there. Maybe we should ship an improved brp-python-bytecompile in redhat-rpm-config while lobbying rpm upstream to adopt it?
Worth it? Other comments?
A 15% space gain (under python) w/o any drawbacks? Always worth it. :)
"VS" == Ville Skyttä ville.skytta@iki.fi writes:
VS> I suppose something like the above could be easily added to VS> redhat-rpm-config or rpm, eg. embedded in brp-python-bytecompile VS> or run after it in %__os_install_post.
I expect that Python is where most of the savings come from, but it might be worth running something like this to catch duplicated files in any package. The only issue is that we have to be careful to only look within single directories, because we cannot predict where filesystem boundaries will be.
- J<
On Tuesday 03 April 2007, Ville Skyttä wrote:
Hello,
Related to recent space saving discussions, I came across PLD's rpm-build-macros package recently, and found that they hardlink identical *.pyc and *.pyo.
[...]
The PLD implementation looks like this:
[...]
The use of "cmp" would require diffutils installed. Or the above could be converted to use hardlink instead (which would have to be made sure to be around) or maybe sha1sum (in coreutils, pretty much always around in buildroots).
I suppose something like the above could be easily added to redhat-rpm-config or rpm, eg. embedded in brp-python-bytecompile or run after it in %__os_install_post.
Jeremy pinged me about resurrecting this thread, so here goes, the original threads starts at http://www.redhat.com/archives/fedora-packaging/2007-April/msg00003.html for those who missed it.
Anyway, attached is a patch against rpm.org hg for discussion - seems somewhat clumsy to use sha1sum for this but I guess it could be acceptable. Tested on just a few python packages on F-7. Better implementations certainly exist, and are welcome :)
On Wed, 2007-06-13 at 21:51 +0300, Ville Skyttä wrote:
On Tuesday 03 April 2007, Ville Skyttä wrote:
Related to recent space saving discussions, I came across PLD's rpm-build-macros package recently, and found that they hardlink identical *.pyc and *.pyo.
[...]
The PLD implementation looks like this:
[...]
The use of "cmp" would require diffutils installed. Or the above could be converted to use hardlink instead (which would have to be made sure to be around) or maybe sha1sum (in coreutils, pretty much always around in buildroots).
I suppose something like the above could be easily added to redhat-rpm-config or rpm, eg. embedded in brp-python-bytecompile or run after it in %__os_install_post.
Jeremy pinged me about resurrecting this thread, so here goes, the original threads starts at http://www.redhat.com/archives/fedora-packaging/2007-April/msg00003.html for those who missed it.
Anyway, attached is a patch against rpm.org hg for discussion - seems somewhat clumsy to use sha1sum for this but I guess it could be acceptable. Tested on just a few python packages on F-7. Better implementations certainly exist, and are welcome :)
Looks okay to me. Probably the easiest thing to do is to put it in redhat-rpm-config for now[1] and then go from there. If anyone disagrees with it, raise your hands... otherwise, I'll get it in the start of next week
Jeremy
[1] Basically, we'll change the macros to call a brp-python-post in redhat-rpm-config. That script will call the one in stock rpm as well as do the hardlink steps.
On Wed, 2007-06-13 at 15:31 -0400, Jeremy Katz wrote:
On Wed, 2007-06-13 at 21:51 +0300, Ville Skyttä wrote:
On Tuesday 03 April 2007, Ville Skyttä wrote:
Related to recent space saving discussions, I came across PLD's rpm-build-macros package recently, and found that they hardlink identical *.pyc and *.pyo.
[...]
The PLD implementation looks like this:
[...]
The use of "cmp" would require diffutils installed. Or the above could be converted to use hardlink instead (which would have to be made sure to be around) or maybe sha1sum (in coreutils, pretty much always around in buildroots).
I suppose something like the above could be easily added to redhat-rpm-config or rpm, eg. embedded in brp-python-bytecompile or run after it in %__os_install_post.
Jeremy pinged me about resurrecting this thread, so here goes, the original threads starts at http://www.redhat.com/archives/fedora-packaging/2007-April/msg00003.html for those who missed it.
Anyway, attached is a patch against rpm.org hg for discussion - seems somewhat clumsy to use sha1sum for this but I guess it could be acceptable. Tested on just a few python packages on F-7. Better implementations certainly exist, and are welcome :)
Looks okay to me. Probably the easiest thing to do is to put it in redhat-rpm-config for now[1] and then go from there. If anyone disagrees with it, raise your hands... otherwise, I'll get it in the start of next week
And this is done in redhat-rpm-config-9.0.0-1 that I just built.
Jeremy
packaging@lists.fedoraproject.org