Hi,
Sorry for the cross post and long email :-)
Currently I am working on a very initial state build of Mandriva for arm. Thanks to Jeff Johnson for giving me ssh access to armv7 hosts, and Matthew Dawkins for building several Mandriva/Unity linux armv5 packages.
What I am trying to understand now is about choice of float abi. I understand that the IHI0042D_aapcs.pdf file I donwload says to use vfp registers for float/double arguments, but softfp seems too good to miss, as armv5 should be around for some time yet.
So, I have two chroots, running: softfp# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper Target: armv7l-mandriva-linux-gnueabi Configured with: /home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure --prefix=/usr --build=i586-mandriva-linux-gnu --host=armv7l-mandriva-linux-gnueabi --target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix --disable-libssp --disable-libmudflap Thread model: posix gcc version 4.6.1 20110722 (Mandriva) (GCC)
thumb# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/armv7l-mandriva-linux-gnueabi/4.6.1/lto-wrapper Target: armv7l-mandriva-linux-gnueabi Configured with: /home/pcpa/bootstrap/rpmbuild/BUILD/gcc-4.6-20110722/configure --prefix=/usr --build=i586-mandriva-linux-gnu --host=armv7l-mandriva-linux-gnueabi --target=armv7l-mandriva-linux-gnueabi --enable-werror=no --enable-cxx --with-cpu=cortex-a8 --with-tune=cortex-a8 --with-arch=armv7-a --with-mode=thumb --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux --enable-languages=c,c++ --enable-threads=posix --disable-libssp --disable-libmudflap Thread model: posix gcc version 4.6.1 20110722 (Mandriva) (GCC)
This is unmodified upstream gcc, and using a set of bootstrap scripts from a git branch I made on a checkout of
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
Just so you know I am running thumb and arm builds, with thumb using hard float and the softfp with arm instructions set:
softfp# objdump -d /usr/lib/libm.so | less [...] 00008d30 <__ieee754_atan2>: 8d30: e3a0c000 mov ip, #0 8d34: e347cff0 movt ip, #32752 ; 0x7ff0 8d38: e92d4030 push {r4, r5, lr} 8d3c: ed2d8b10 vpush {d8-d15} 8d40: e3a05000 mov r5, #0 8d44: ec432b18 vmov d8, r2, r3 8d48: e3475ff0 movt r5, #32752 ; 0x7ff0 8d4c: e003c00c and ip, r3, ip 8d50: e15c0005 cmp ip, r5 8d54: e24dd02c sub sp, sp, #44 ; 0x2c 8d58: e1a04003 mov r4, r3 8d5c: ec410b19 vmov d9, r0, r1 8d60: e1a05002 mov r5, r2 8d64: 0a000022 beq 8df4 <__ieee754_atan2+0xc4> [...]
thumb# objdump -d /usr/lib/libm.so | less [...] 00007884 <__ieee754_atan2>: 7884: 2100 movs r1, #0 7886: 2000 movs r0, #0 7888: f6c7 71f0 movt r1, #32752 ; 0x7ff0 788c: ec53 2b11 vmov r2, r3, d1 7890: f6c7 70f0 movt r0, #32752 ; 0x7ff0 7894: 4019 ands r1, r3 7896: 4281 cmp r1, r0 7898: e92d 03f0 stmdb sp!, {r4, r5, r6, r7, r8, r9} 789c: ed2d 8b10 vpush {d8-d15} 78a0: 461c mov r4, r3 78a2: b08a sub sp, #40 ; 0x28 78a4: eeb0 8b41 vmov.f64 d8, d1 78a8: 4616 mov r6, r2 78aa: eeb0 9b40 vmov.f64 d9, d0 78ae: d03c beq.n 792a <__ieee754_atan2+0xa6> [...]
I am kind of trying to figure what "The Industry" says about it, and just checked the linaro gcc-4.6 relevant changes for me right now, that are...
+ --with-arch=armv7-a --with-tune=cortex-a8 \ + --with-float=$(float_abi) --with-fpu=neon \
+# check if we're building for armel or armhf +ifeq ($(DEB_TARGET_ARCH),armhf) + float_abi := hard +else ifneq (,$(filter $(DEB_TARGET_ARCH), arm armel)) + float_abi := softfp +endif
If I understand correctly, neon will have better support for simd instructions right?
Either way, I used two simple benchmarks to try to sell myself the idea of breaking compatibility with armv5 or older binaries, but still not convinced, but, as I said, we should use whatever "The Industry" chooses :-) I used for benchmark http://www.tux.org/~mayer/linux/bmark.html and http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-... and also compared with my home computer (quad)core i5 x86_64, and attached results...
Thanks and again sorry for cross posting and long email, Paulo
What I am trying to understand now is about choice of float abi.
Not much to understand - each project chooses the abi that best meets their goals. If you want to learn the history, it's all in the mail archives.
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
If you want to do that, you don't need my bootstrap scripts. The whole *point* of a bootstrap was to bring up an *incompatible* abi from scratch. If you want to use a compatible abi, just keep using the armv5 version of Fedora instead. It was decided long ago that the armv7 version of Fedora would use the hardfp abi (hence the project name "hardfp bootstrap"), but you can't build hardfp binaries on a softfp platform, so we had to start from scratch to do hardfp.
It's also a fun exercise in bootstrapping, to make sure we still can do it.
I am kind of trying to figure what "The Industry" says about it,
If you need someone else's approval, you've missed the point of Free Software. Each project has their own goals, and there is no "The Industry" to tell us what to do. If you want to be part of a project, find the one that has the same goals as you do, and join them.
If I understand correctly, neon will have better support for simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
On Monday, August 01, 2011 12:35:06 PM DJ Delorie wrote:
What I am trying to understand now is about choice of float abi.
Not much to understand - each project chooses the abi that best meets their goals. If you want to learn the history, it's all in the mail archives.
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did
the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
If you want to do that, you don't need my bootstrap scripts. The whole *point* of a bootstrap was to bring up an *incompatible* abi from scratch. If you want to use a compatible abi, just keep using the armv5 version of Fedora instead. It was decided long ago that the armv7 version of Fedora would use the hardfp abi (hence the project name "hardfp bootstrap"), but you can't build hardfp binaries on a softfp platform, so we had to start from scratch to do hardfp.
We decided to keep using soft rather than softfp on armv5 because softfp while it can use a hardware floating point unit if its available has the extra overhead of working out at runtime if it has a hardware floting point unit or not. by making the distinct v7 port and using hardfp we gain the speed of using the hardware floating point unit without the runtime overhead. but since softfp and soft are compatiable you could just build using fedora as a base. or any other ABI compatiable distro.
It's also a fun exercise in bootstrapping, to make sure we still can do it.
I am kind of trying to figure what "The Industry" says about it,
If you need someone else's approval, you've missed the point of Free Software. Each project has their own goals, and there is no "The Industry" to tell us what to do. If you want to be part of a project, find the one that has the same goals as you do, and join them.
If I understand correctly, neon will have better support for
simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
marvell and nvidia armv7 chips dont have neon. which includes the xo-1.75
Dennis
Quoting DJ Delorie dj@redhat.com:
It's also a fun exercise in bootstrapping, to make sure we still can do it.
I haven't looked at the docs in a while, but we are most likely going to need this again in the distant future. Plus the fact it seems to come up all the time. It would be appropriate to have it overly documented. (Overly in my definition includes stupid details, like if xyz happens you screwed up step 113. Your definition may vary. :) )
If I understand correctly, neon will have better support for simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
IIRC Neon isn't a requirement for armv7 but a vfpu is. It is a good choice.
Neon is a simd processor, however, the code needs tweaking for neon, so that would be a great place to volunteer if you are looking for something. There are a 100 other places to help also, including testing documentation.. :)
I haven't looked at the docs in a while, but we are most likely going to need this again in the distant future. Plus the fact it seems to come up all the time.
One of the things I'm hoping we get out of this is an official fully automated way to bootstrap, make it fast, and do it often. Then it will always be there when we need it.
Quoting DJ Delorie dj@redhat.com:
I haven't looked at the docs in a while, but we are most likely going to need this again in the distant future. Plus the fact it seems to come up all the time.
One of the things I'm hoping we get out of this is an official fully automated way to bootstrap, make it fast, and do it often. Then it will always be there when we need it.
That would be entirely useful. :P
Im still liking the idea of documentation. I think there are a lot of people that don't understand the whole process especially the why + how.
On Mon, Aug 01, 2011 at 02:34:56PM -0400, DJ Delorie wrote:
I haven't looked at the docs in a while, but we are most likely going to need this again in the distant future. Plus the fact it seems to come up all the time.
One of the things I'm hoping we get out of this is an official fully automated way to bootstrap, make it fast, and do it often. Then it will always be there when we need it.
Sounds like an excellent idea. :-)
We're working on a similar goal within Debian too.
Cheers,
Em 1 de agosto de 2011 14:35, DJ Delorie dj@redhat.com escreveu:
What I am trying to understand now is about choice of float abi.
Not much to understand - each project chooses the abi that best meets their goals. If you want to learn the history, it's all in the mail archives.
I am using some local branches of your scripts to build several combinations for armv7 chroots, stoping at stage2 and building a few rpms:
(calling hardfp for easier understanding and using vfpv3-d16 if neon ommited)
arm+hardfp arm+softfp thumb+hardfp thumb+softfp thumb+hardfp+neon thumb+softfp+neon
From my understanding, neon generates "prettier" objdump output when looking at libm.so, but runtime of simple benchmarks does not show any difference.
git clone git://fedorapeople.org/~djdelorie/bootstrap.git
Since I am still very "arm noob" :-) and just yesterday did the thumb build to learn about thumb, so far, my impression is that the best approach should be to use thumb+softfp.
If you want to do that, you don't need my bootstrap scripts. The whole *point* of a bootstrap was to bring up an *incompatible* abi from scratch. If you want to use a compatible abi, just keep using the armv5 version of Fedora instead. It was decided long ago that the armv7 version of Fedora would use the hardfp abi (hence the project name "hardfp bootstrap"), but you can't build hardfp binaries on a softfp platform, so we had to start from scratch to do hardfp.
Actually, I know now that I was also partially confused by misunderstanding the --with-float=hard abi, so, I wrote a simple program to better understand the calling conversion being generated. For some reason I was thinking that it would use only two vfp registers for arguments, but it can use up to 8. But using softfp convention for variadic functions may be tough for some applications; I wrote two "initial state" jits for arm: https://github.com/pcpa/lightning/tree/master/lightning/arm and direct links to other, as it is not in a single project... https://code.google.com/p/exl/source/browse/trunk/lib/ejit_arm-cpu.c https://code.google.com/p/exl/source/browse/trunk/lib/ejit_arm-swf.c https://code.google.com/p/exl/source/browse/trunk/lib/ejit_arm-vfp.c
So, today after better understanding the ABI, I also made a simple test case, to call 100 million times a function receiving 8 double arguments and return one. Compiled with -O0 or gcc just optimizes out the call sequence and all timings become identical, and I noticed a 20-25% faster execution, on what should be where it should make most difference: 8 arguments in registers and return in register, contrary to 2 in r0,r1,r2,r3, converted to vfp, and 6 on stack, and then again the conversion for return...
As Loïc Minier said in the other response (Thanks!) this should be most of an issue when calling functions from different libraries, where gcc cannot optimize much. And presuming one is passing 2-8 float/double arguments a lot in inner loops, and not in vectors...
It's also a fun exercise in bootstrapping, to make sure we still can do it.
With that I agree :-)
I am kind of trying to figure what "The Industry" says about it,
If you need someone else's approval, you've missed the point of Free Software. Each project has their own goals, and there is no "The Industry" to tell us what to do. If you want to be part of a project, find the one that has the same goals as you do, and join them.
I did not express myself clearly. Attempting to better describe the idea I tried to expose, but failed: By doing packages for armv7, and assuming I am working for Mandriva, we are better sticking to what upstream does and supports (read "The Industry" -> "upstream"; I personally can hack here and there, but not much else)
If I understand correctly, neon will have better support for simd instructions right?
There are still some armv7 chips that don't have neon, though, so we (Fedora) chose to avoid neon for now.
I did not learn much yet about it, but maybe using neon for integer division could be a "huge win", as otherwise, there is no division instruction (well, not in arm mode)...
Thanks, Paulo
I am using some local branches of your scripts to build several combinations for armv7 chroots, stoping at stage2 and building a few rpms:
Not it's expected use-case, but intersting anyway :-)
I did not learn much yet about it, but maybe using neon for integer division could be a "huge win",
It would be a huge loss on chips without neon, though.
Performance is *not* the only criteria we deal with.