Intersecting Intel & AMD Instruction Set Extensions

Tim Janik

Projects

In some of my projects, I’ve recently had the need to utilize FMA (fused-multiply-add) or AVX instructions. Compiling C/C++ on X86_64 will by default only activate MXX and a few of the early SSE extensions. The utilized instruction set basically predates the core2 which was introduced in 2006.

Math instructions and vectorizations can greatly benefit from more modern instructions like SSE4*, FMA, AVX, AVX2, etc, but because of the way the -march compiler option works, those are not easily enabled for all CPU types of similar age.

-march=native enables all the bells and whistles of the compiling CPU, which is good for locally compiled programs to e.g. do number crunching, but the resulting program is unlikely to run on a CPU of similar age from a different manufacturer.
-march=bdver4 runs on the last AMD Bulldozer variant and later AMD CPUs, but uses FMA4 or SSE4A which are AMD specific and break Intel CPUs.
-march=haswell runs on all Intel CPUs from the last 7 years, but is allowed to use HLE instructions not supported on Bulldozers, etc.

With the -Q --help=target options, GCC can tell what specific features are enabled, given a specific architecture (excerpt):

gcc -Q --help=target -march=haswell
  -mavx                                 [enabled]
  -mavx2                                [enabled]
  -mbmi2                                [enabled]
  -mssse3                               [enabled]

Using the Wikipedia lists of AMD’s APU release dates and the Intel Core release dates together with the GCC X86 instruction set options, we can determine which options are common to AMD and Intel CPUs of a certain age and are available to compile binaries with modern instruction sets that run on both brands.

As a result, here is a Makefile snippet for GCC/Clang, that picks up all recent extensions which should run on all AMD CPUs since 2015 and Intel CPUs since 2013. The snippet sets up OPTIMIZE (to be added to CFLAGS/CXXFLAGS) by selecting all instructions supported by both Haswell and the latest Bulldozer - unless /proc/cpuinfo shows the compiling CPU is older by checking the BMI2 flag:

uname_M        ::= $(shell uname -m 2>/dev/null || echo None)
# AMD64 / X86_64 optimizations
ifeq ($(uname_M),x86_64)
  # Use haswell (Intel) and bdver4 (AMD Excavator Family 15h)
  # instruction sets, plus 2015 era tuning
  ARCHX64_2015 ::= -march=core2 -mtune=skylake
  ARCHX64_2015  += -mavx -mavx2 -mbmi -mbmi2 -mf16c -mfma -mfsgsbase
  ARCHX64_2015  += -mlzcnt -mmovbe -mpclmul -mpopcnt
  ARCHX64_2015  += -mrdrnd -msse4 -msse4.1 -msse4.2 -mxsave -mxsaveopt
  proc/cpuinfo ::= $(file < /proc/cpuinfo)
  ifeq ($(firstword $(filter bmi2, $(proc/cpuinfo))),bmi2)
    OPTIMIZE    += $(ARCHX64_2015)
  else
    OPTIMIZE    += -mcx16             # CMPXCHG16B, AMD64 2005
    OPTIMIZE    += -mmmx -msse -msse2 # Intel 2001, AMD 2003
    OPTIMIZE    += -msse3             # Intel 2004, AMD 2007
    OPTIMIZE    += -mssse3            # Intel 2006, AMD 2011
    # OPTIMIZE  += -msse4a            # AMD only, 2007
    # OPTIMIZE  += -msse4.1 -msse4.2  # Intel 2008, AMD 2011
    # OPTIMIZE  += -mavx              # Intel 2011, AMD 2011
    # OPTIMIZE  += -mavx2             # Intel 2013, AMD 2015
  endif
endif

In case a GNU Make version before 4.2 is used, which does not yet support the file input operator syntax, peeking at the compiling CPU type can be done with this change:

-  proc/cpuinfo ::= $(file < /proc/cpuinfo)
+  proc/cpuinfo ::= $(shell cat /proc/cpuinfo)

Depending on a CPU age of 5-7 years should strike an acceptable balance between acceleration utilization and CPU deprecation for SIMD hungry applications like e.g. Beast.

Post comment via email