On 6/16/21 12:19 PM, Daniel P. Berrangé wrote:
On Wed, Jun 16, 2021 at 12:01:29PM +0200, Hans de Goede wrote:
> On 6/16/21 10:28 AM, Daniel P. Berrangé wrote:
>> On Tue, Jun 15, 2021 at 05:34:02PM -0400, Neal Gompa wrote:
>>> Hey all,
>>> Earlier this week, I was helping with processing features for openSUSE
>>> Leap 15.4 and I discovered that they're planning on introducing
>>> x86_64-v2 to openSUSE soon. The reference for this change was that
>>> RHEL 9 is going to use x86_64-v2. Additionally, other distributions
>>> have been considering bumping up to v2 or v3.
>>> Some cursory examination of the new x86_64 sublevels seem to indicate
>>> that x86_64-v2 goes back to roughly 2007~2008, merely cutting off the
>>> first couple of generations of x86_64 CPUs from Intel and AMD. I
>>> personally don't have any computers that don't have support for
>>> x86_64-v2 anymore.
>> Yes, you loose primarily Intel Conroe and Penryn generations and
>> AMD Opteron Gen 1 -> Gen 3. I doubt this is a significant portion
>> of Fedora installs.
>> Slight tangent but I find Fedora's approach to hardware somewhat
>> at odds with our approach to software.
>> On the one hand we portray our project as a place for cutting
>> edge Linux software & innovation.
>> On the other hand we hold back our software by trying to keep
>> supporting long obsolete hardware.
>> There is of course always a balance between bumping min hardware
>> specs and the impact on maintainers & users, but I'm not convinced
>> that we have the balance right in targeting our x86_64 baseline at
>> the very first generation of 64-bit CPUs from 15 years ago. I can't
>> imagine such old CPUs makes up a significant portion of our users.
> I don't know about that, all I can offer is my own anecdotal to
> the contrary. Of the 7 PCs/laptops which are in more or less
> daily use in our houshold 3 of them are still core2 duo systems.
> Once the core2 duo / amd64 machines came out we really started hitting
> the point of diminishing returns wrt PC performance for day 2 day
> use. For a lot of simple day2 day use there really is no reason
> to replace and x86_64-v1 capable machines unless they are
> actually broken.
> Perhaps more importantly though, is that there we are also very
> much at the point where bumping the processor architecture
> requirements also leads to strongly diminishing returns.
> Also see Mateusz Jończyk excellent reply in this thread, how
> rebuilding packages for x86_64-v2 vs x86_64 results in a barely
> measurable performance improvement.
> Of course there are some specific algorithms which greatly
> benefit from sse4.2, but those typically benefit even more
> from avx/avx2 which are not included in x86_64-v2; and often
> libraries already contain avx optimized code-paths for this
> which they automatically use where possible.
> You talk about we "hold back our software by trying to keep
> supporting long obsolete hardware". Let me flip the question
> can you provide hard proof, as in concrete numbers showing
> significant improvements, that switching to x86_64-v2
> actually buys us anything meaningful ?
I wasn't so much thinking about the performance benefit,
rather the CMPXCHG16B support which IIUC is required for
atomics on 128 bit quantities and isn't present in the
x86_64 baseline. QEMU already unconditionally adds -mcx16
to its CFLAGS to enable usage of this instruction.
CMPXCHG16B is indeed supported on pretty much any x86_64 machine,
including on Intel Conroe and Penryn AFAICT.
Only very old AMD64 CPUs, which are still using DDR1 don't
support this (AFAICT).
So yes requiring that is probably fine.
I did think there might be some performance benefits too,
so it was interesting to see the disappointing results
posted elsewhere in this thread.
Ack, I suspect that the cases where there are really significant
gains in using SSE4 are already covered by (optional) SSE4
optimized code paths.
I think it would be better if we want to look into using newer
instructions into looking into things like this.
E.g. if there is some library which does significantly benefit
from gcc's auto-vectorisation with sse4 and/or avx then we
could build it multiple times with different settings and use
the hwcaps based library loading mechanism to make an optimized
version get loaded on hw which supports it. This way we can even
use avx / x86_64-v3 / -v4 in cases where this actually is worth
the extra effort + diskspace.