In some of my projects, I’ve recently had the need to utilize FMA (fused-multiply-add) or AVX instructions. Compiling C/C++ on X86_64 will by default only activate MXX and a few of the early SSE extensions. The utilized instruction set basically predates the core2 which was introduced in 2006.
Math instructions and vectorizations can greatly benefit from more modern instructions like SSE4*, FMA, AVX, AVX2, etc, but because of the way the
-march compiler option works, those are not easily enabled for all CPU types of similar age.
-march=nativeenables all the bells and whistles of the compiling CPU, which is good for locally compiled programs to e.g. do number crunching, but the resulting program is unlikely to run on a CPU of similar age from a different manufacturer.
-march=bdver4runs on the last AMD Bulldozer variant and later AMD CPUs, but uses FMA4 or SSE4A which are AMD specific and break Intel CPUs.
-march=haswellruns on all Intel CPUs from the last 7 years, but is allowed to use HLE instructions not supported on Bulldozers, etc.
-Q --help=target options, GCC can tell what specific features are enabled, given a specific architecture (excerpt):
Using the Wikipedia lists of AMD’s APU release dates and the Intel Core release dates together with the GCC X86 instruction set options, we can determine which options are common to AMD and Intel CPUs of a certain age and are available to compile binaries with modern instruction sets that run on both brands.
As a result, here is a Makefile snippet for GCC/Clang, that picks up all recent extensions which should run on all AMD CPUs since 2015 and Intel CPUs since 2013. The snippet sets up
OPTIMIZE (to be added to
CFLAGS/CXXFLAGS) by selecting all instructions supported by both Haswell and the latest Bulldozer - unless
/proc/cpuinfo shows the compiling CPU is older by checking the BMI2 flag:
uname_M ::= $(shell uname -m 2>/dev/null || echo None) # AMD64 / X86_64 optimizations ifeq ($(uname_M),x86_64) # Use haswell (Intel) and bdver4 (AMD Excavator Family 15h) # instruction sets, plus 2015 era tuning ARCHX64_2015 ::= -march=core2 -mtune=skylake ARCHX64_2015 += -mavx -mavx2 -mbmi -mbmi2 -mf16c -mfma -mfsgsbase ARCHX64_2015 += -mlzcnt -mmovbe -mpclmul -mpopcnt ARCHX64_2015 += -mrdrnd -msse4 -msse4.1 -msse4.2 -mxsave -mxsaveopt proc/cpuinfo ::= $(file < /proc/cpuinfo) ifeq ($(firstword $(filter bmi2, $(proc/cpuinfo))),bmi2) OPTIMIZE += $(ARCHX64_2015) else OPTIMIZE += -mcx16 # CMPXCHG16B, AMD64 2005 OPTIMIZE += -mmmx -msse -msse2 # Intel 2001, AMD 2003 OPTIMIZE += -msse3 # Intel 2004, AMD 2007 OPTIMIZE += -mssse3 # Intel 2006, AMD 2011 # OPTIMIZE += -msse4a # AMD only, 2007 # OPTIMIZE += -msse4.1 -msse4.2 # Intel 2008, AMD 2011 # OPTIMIZE += -mavx # Intel 2011, AMD 2011 # OPTIMIZE += -mavx2 # Intel 2013, AMD 2015 endif endif
In case a GNU Make version before 4.2 is used, which does not yet support the file input operator syntax, peeking at the compiling CPU type can be done with this change:
Depending on a CPU age of 5-7 years should strike an acceptable balance between acceleration utilization and CPU deprecation for SIMD hungry applications like e.g. Beast.