GCC offers a generic tuning option for Arm these days, but we select -mtune=cortex-8a instead.
Is this still a good choice?
Thanks, Florian
On Thu, Jan 25, 2018 at 2:46 PM, Florian Weimer fweimer@redhat.com wrote:
GCC offers a generic tuning option for Arm these days, but we select -mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any details about it anywhere? Basically Cortex-A8 is pretty much the lowest common denominator for ARMv7
Peter
On 01/25/2018 03:52 PM, Peter Robinson wrote:
On Thu, Jan 25, 2018 at 2:46 PM, Florian Weimer fweimer@redhat.com wrote:
GCC offers a generic tuning option for Arm these days, but we select -mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any details about it anywhere? Basically Cortex-A8 is pretty much the lowest common denominator for ARMv7
The generic tuning has this:
/* Generic Cortex tuning. Use more specific tunings if appropriate. */ const struct tune_params arm_cortex_tune = { &generic_extra_costs, &generic_addr_mode_costs, /* Addressing mode costs. */ NULL, /* Sched adj cost. */ arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ 5, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, tune_params::PREF_CONST_POOL_FALSE, tune_params::PREF_LDRD_FALSE, tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ tune_params::DISPARAGE_FLAGS_NEITHER, tune_params::PREF_NEON_64_FALSE, tune_params::PREF_NEON_STRINGOPS_FALSE, tune_params::FUSE_NOTHING, tune_params::SCHED_AUTOPREF_OFF };
The Cortex-A8 tuning is:
const struct tune_params arm_cortex_a8_tune = { &cortexa8_extra_costs, &generic_addr_mode_costs, /* Addressing mode costs. */ NULL, /* Sched adj cost. */ arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ 5, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, tune_params::PREF_CONST_POOL_FALSE, tune_params::PREF_LDRD_FALSE, tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ tune_params::DISPARAGE_FLAGS_NEITHER, tune_params::PREF_NEON_64_FALSE, tune_params::PREF_NEON_STRINGOPS_TRUE, tune_params::FUSE_NOTHING, tune_params::SCHED_AUTOPREF_OFF };
The real difference is in generic_extra_costs vs generic_extra_costs, and too large to include here. One of the differences seems to be that on Cortex-A8, floating point multiply & divide is considered relatively more expensive, if I read the sources correctly. But this all a bit black magic.
Thanks, Florian
On Thu, Jan 25, 2018 at 3:02 PM, Florian Weimer fweimer@redhat.com wrote:
On 01/25/2018 03:52 PM, Peter Robinson wrote:
On Thu, Jan 25, 2018 at 2:46 PM, Florian Weimer fweimer@redhat.com wrote:
GCC offers a generic tuning option for Arm these days, but we select -mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any details about it anywhere? Basically Cortex-A8 is pretty much the lowest common denominator for ARMv7
The generic tuning has this:
So reading the gcc docs [1] it seems that generic-armv7-a makes sense.
To quote "should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs."
We still support a number of Cortex-A8 devices but we have a lot more Cortex_A7/9/15 devices these days too so I think generic makes sense here.
Thanks, Peter
[1] https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
/* Generic Cortex tuning. Use more specific tunings if appropriate. */ const struct tune_params arm_cortex_tune = { &generic_extra_costs, &generic_addr_mode_costs, /* Addressing mode costs. */ NULL, /* Sched adj cost. */ arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ 5, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, tune_params::PREF_CONST_POOL_FALSE, tune_params::PREF_LDRD_FALSE, tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ tune_params::DISPARAGE_FLAGS_NEITHER, tune_params::PREF_NEON_64_FALSE, tune_params::PREF_NEON_STRINGOPS_FALSE, tune_params::FUSE_NOTHING, tune_params::SCHED_AUTOPREF_OFF };
The Cortex-A8 tuning is:
const struct tune_params arm_cortex_a8_tune = { &cortexa8_extra_costs, &generic_addr_mode_costs, /* Addressing mode costs. */ NULL, /* Sched adj cost. */ arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ 5, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, tune_params::PREF_CONST_POOL_FALSE, tune_params::PREF_LDRD_FALSE, tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */ tune_params::DISPARAGE_FLAGS_NEITHER, tune_params::PREF_NEON_64_FALSE, tune_params::PREF_NEON_STRINGOPS_TRUE, tune_params::FUSE_NOTHING, tune_params::SCHED_AUTOPREF_OFF };
The real difference is in generic_extra_costs vs generic_extra_costs, and too large to include here. One of the differences seems to be that on Cortex-A8, floating point multiply & divide is considered relatively more expensive, if I read the sources correctly. But this all a bit black magic.
Thanks, Florian
On Thu, Jan 25, 2018 at 3:13 PM, Peter Robinson pbrobinson@gmail.com wrote:
On Thu, Jan 25, 2018 at 3:02 PM, Florian Weimer fweimer@redhat.com wrote:
On 01/25/2018 03:52 PM, Peter Robinson wrote:
On Thu, Jan 25, 2018 at 2:46 PM, Florian Weimer fweimer@redhat.com wrote:
GCC offers a generic tuning option for Arm these days, but we select -mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any details about it anywhere? Basically Cortex-A8 is pretty much the lowest common denominator for ARMv7
The generic tuning has this:
So reading the gcc docs [1] it seems that generic-armv7-a makes sense.
To quote "should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs."
We still support a number of Cortex-A8 devices but we have a lot more Cortex_A7/9/15 devices these days too so I think generic makes sense here.
I also wonder whether it's worthwhile using neon-vfpv3, I can't tell from the docs though if a SoC doesn't have neon if it will fall back to VFP3 or just fail altogether.
Peter
On 01/27/2018 01:03 PM, Peter Robinson wrote:
I also wonder whether it's worthwhile using neon-vfpv3, I can't tell from the docs though if a SoC doesn't have neon if it will fall back to VFP3 or just fail altogether.
I commented on this already in various other places. NEON only offers 32-bit non-IEEE floats, so it's only applicable with manual tweaking (even auto-vectorization at -O3 wouldn't use it due to the non-IEEE nature).
I don't think it's worth making the switch, even if we could somehow verify that it wouldn't impact board support.
Thanks, Florian