I am one of the maintainers of the ntl package, which is used by some numeric applications (e.g., Macaulay2 and sagemath). Upstream supports use of the PCLMUL instruction, the AVX instructions, and the FMA instructions to speed up various computations. We can't use any of those in Fedora, since we have to support a baseline x86_64.
Well, that's kind of a downer. I could advertise that people with newer CPUs ought to rebuild the ntl package for their own CPUs, but what's a distribution for if people have to rebuild packages? I've been looking for a way to automatically support more recent CPUs.
Yesterday I sent a patch upstream that uses gcc's indirect function support together with __attribute__((target ...)) to build vanilla x86_64, PCLMUL-enabled, AVX-enabled, and FMA-enabled varieties of several functions. Upstream was initially excited about this but then, on further reflection, offered the opinion that this approach is dangerous. The problem is that some of the types involved may change ABI depending on the instruction set in use, and therefore it would be necessary to build larger portions of the library for each supported CPU variant. At that point, as upstream said, we might as well just build the entire library for each variant. The problem then is how to choose which version of the library to use at load time.
On some platforms, ld.so offers "hardware capabilities", such as sse2 on i386. By dropping a vanilla library into /usr/lib and an SSE2-enabled build into /usr/lib/sse2, applications can get the version of the library appropriate for the CPU in use. But there don't seem to be any defined hardware capabilities for x86_64.
Has anybody already thought this through? What's the best approach to take? For this package, the speedups are substantial, so this is worth doing, if it can be done well.
Thank you,
On 04/01/2016 10:32 PM, Jerry James wrote:
Yesterday I sent a patch upstream that uses gcc's indirect function support together with __attribute__((target ...)) to build vanilla x86_64, PCLMUL-enabled, AVX-enabled, and FMA-enabled varieties of several functions. Upstream was initially excited about this but then, on further reflection, offered the opinion that this approach is dangerous. The problem is that some of the types involved may change ABI depending on the instruction set in use, and therefore it would be necessary to build larger portions of the library for each supported CPU variant.
Do these types leak across the library boundary, to applications using the library?
Florian
W dniu 01.04.2016 o 22:32, Jerry James pisze:
I am one of the maintainers of the ntl package, which is used by some numeric applications (e.g., Macaulay2 and sagemath). Upstream supports use of the PCLMUL instruction, the AVX instructions, and the FMA instructions to speed up various computations. We can't use any of those in Fedora, since we have to support a baseline x86_64.
Note that FMA may affect precision so tests which base on float number can fail.
On Fri, 2016-04-01 at 14:32 -0600, Jerry James wrote:
I am one of the maintainers of the ntl package, which is used by some numeric applications (e.g., Macaulay2 and sagemath). Upstream supports use of the PCLMUL instruction, the AVX instructions, and the FMA instructions to speed up various computations. We can't use any of those in Fedora, since we have to support a baseline x86_64.
Well, that's kind of a downer. I could advertise that people with newer CPUs ought to rebuild the ntl package for their own CPUs, but what's a distribution for if people have to rebuild packages? I've been looking for a way to automatically support more recent CPUs.
[...]
Has anybody already thought this through? What's the best approach to take? For this package, the speedups are substantial, so this is worth doing, if it can be done well.
In crypto libraries and gmp these optimizations are enabled using the cpuid information on runtime. That is, check the cpu capabilities on application/library load and override functions for specific functionality (e.g., with function pointers) on runtime. Is that something that can be used by ntl upstream?
regards, Nikos
Hi Jerry,
I'm kind of resurrecting a dead thread here, but maybe a recent change to GCC has made what you were considering possible. I stumbled upon it when looking for information about optimized extensions for another package.
Recently, GCC expanded support for multiversioning. It's in GCC 6, which is included in F24. You can now write one function, and have it multiversioned. This includes support for C++ and C (as opposed to C++ only), works in x86_64 as well, and you only write one version of the function (which should avoid those ABI changes you mentioned). It's currently limited to AVX(2) from what I understand, so it doesn't help me, but maybe you'll find it useful. There's also a nifty tool to identify areas that can be sped up with the new instructions.
More info here: http://lwn.net/Articles/691932/
(I didn't fully understand your msg on which implementation of multiversioning you were using, so if you're already using the new one, please disregard!)
Take care! Matt