Alan Cox wrote:
Likewise my own testing has always found that the Athlon really
doesn't care much how you order instructions. If you think about it
AMD have spent years dealing with everyone optimising for random intel
processor of the year and adapted appropriately.
I have to agree with Alan on this. Having assembled Athlon-based
clusters from Athlon's introduction until a year ago, the i686 (Pentium
Pro) optimizations get you very close, like within 5%, of ideal on
Athlon32.
Except on things like 3dnow and prefetch stuff the AMD really
doesn't
seem to care.
Plus any floating point.
Athlon32/Athlon64/Opteron has 3 FPUs, two complex and one simple.
Pentium II to Pentium IV has 2 FPUs, of which you can only do either
one complex (while one is idle) or two simple.
While the Athlon does a lot of run-time optimization via out-of-order
execution and register renaming, the compile-time optimizations can
affect things upto 40%.
But that's more of an application thing -- maybe only a few GLibC calls
(?).
--
Bryan J. Smith, E.I. -- b.j.smith(a)ieee.org