In some of my projects, I’ve recently had the need to utilize FMA (fused-multiply-add) or AVX instructions. Compiling C/C++ on X86_64 will by default only activate MXX and a few of the early SSE extensions. The utilized instruction set basically predates the core2 which was introduced in 2006.
Math instructions and vectorizations can greatly benefit from more
modern instructions like SSE4*,
FMA, AVX, AVX2, etc, but because of the way the -march
compiler option works, those are not easily enabled for all CPU types of
similar age.
-march=native
enables all the bells and whistles of the compiling CPU, which is good for locally compiled programs to e.g. do number crunching, but the resulting program is unlikely to run on a CPU of similar age from a different manufacturer.-march=bdver4
runs on the last AMD Bulldozer variant and later AMD CPUs, but uses FMA4 or SSE4A which are AMD specific and break Intel CPUs.-march=haswell
runs on all Intel CPUs from the last 7 years, but is allowed to use HLE instructions not supported on Bulldozers, etc.
With the -Q --help=target
options, GCC can tell what
specific features are enabled, given a specific architecture
(excerpt):
gcc -Q --help=target -march=haswell
-mavx [enabled]
-mavx2 [enabled]
-mbmi2 [enabled]
-mssse3 [enabled]
Using the Wikipedia lists of AMD’s APU release dates and the Intel Core release dates together with the GCC X86 instruction set options, we can determine which options are common to AMD and Intel CPUs of a certain age and are available to compile binaries with modern instruction sets that run on both brands.
As a result, here is a Makefile snippet for GCC/Clang, that picks up
all recent extensions which should run on all AMD CPUs since 2015 and
Intel CPUs since 2013. The snippet sets up OPTIMIZE
(to be
added to CFLAGS/CXXFLAGS
) by selecting all instructions
supported by both Haswell
and the latest Bulldozer - unless /proc/cpuinfo
shows the
compiling CPU is older by checking the BMI2
flag:
uname_M ::= $(shell uname -m 2>/dev/null || echo None)
# AMD64 / X86_64 optimizations
ifeq ($(uname_M),x86_64)
# Use haswell (Intel) and bdver4 (AMD Excavator Family 15h)
# instruction sets, plus 2015 era tuning
ARCHX64_2015 ::= -march=core2 -mtune=skylake
ARCHX64_2015 += -mavx -mavx2 -mbmi -mbmi2 -mf16c -mfma -mfsgsbase
ARCHX64_2015 += -mlzcnt -mmovbe -mpclmul -mpopcnt
ARCHX64_2015 += -mrdrnd -msse4 -msse4.1 -msse4.2 -mxsave -mxsaveopt
proc/cpuinfo ::= $(file < /proc/cpuinfo)
ifeq ($(firstword $(filter bmi2, $(proc/cpuinfo))),bmi2)
OPTIMIZE += $(ARCHX64_2015)
else
OPTIMIZE += -mcx16 # CMPXCHG16B, AMD64 2005
OPTIMIZE += -mmmx -msse -msse2 # Intel 2001, AMD 2003
OPTIMIZE += -msse3 # Intel 2004, AMD 2007
OPTIMIZE += -mssse3 # Intel 2006, AMD 2011
# OPTIMIZE += -msse4a # AMD only, 2007
# OPTIMIZE += -msse4.1 -msse4.2 # Intel 2008, AMD 2011
# OPTIMIZE += -mavx # Intel 2011, AMD 2011
# OPTIMIZE += -mavx2 # Intel 2013, AMD 2015
endif
endif
In case a GNU Make version before 4.2 is used, which does not yet support the file input operator syntax, peeking at the compiling CPU type can be done with this change:
- proc/cpuinfo ::= $(file < /proc/cpuinfo)
+ proc/cpuinfo ::= $(shell cat /proc/cpuinfo)
Depending on a CPU age of 5-7 years should strike an acceptable balance between acceleration utilization and CPU deprecation for SIMD hungry applications like e.g. Beast.