enderandrew wrote:I'm probably just going to go with:
CFLAGS="-O3 -march=athlon64 -ffast-math -funroll-loops -fpeel-loops -ftracer -pipe"
I would at least urge you to reconsider -funroll-loops and -fpeel-loops. You will speed up some programs, yes; but, you'll probably slow down an equal number.
enderandrew wrote:I know that several people here seem anti-ffast-math, however others insist they've had no problems using it for years and that it has substantial performance increases. If the packages it might break exclude it automatically, then there really isn't much of a risk, while it does in fact offer benefits.
The point with -ffast-math is that most packages that will derive any (substantial) benefit already have it enabled, and that it will occasionally break a package.
enderandrew wrote:My question is what about the following flags? These are all recommended in the AMD PDF file and I haven't seen anyone talk about these.
-fmove-all-movables -freduce-all-givs -mno-align-stringops -minline-all-stringops -mno-push-args
You may want to take a look at
http://gcc.gnu.org/onlinedocs/gcc-3.4.4 ... tions.html.
-fmove-all-movables: Moves loop invariant computations outside of the loop. This is almost always a good thing, and I've never had this flag break anything. On the other hand, I've never had it offer a significant speed up to a program either.
-freduce-all-givs: Strength reduces loop variables. Strength reduction replaces an expensive operation on a variable with less expensive operations that accomplish the same transformation. Also a good thing. However, I have had this flag break the occasional program.
With the above two flags it's not so much what they're attempting to accomplish compared to how effective they actually are at it. With GCC 3.4.3 they rarely tend to enable any significant optimizations. This is probably because -floop-optimize and -fstrength-reduce (both enabled already at -O2) already deal with the cases that can yield any improvement.
-mno-align-stringops: YMMV with this flag, but I've never had any problems and have seen an insignificant improvement in string-heavy routines. (Has anyone seen many programs that really benefit from alignment on x86_64?)
-minline-all-stringops: The GCC manual's actually pretty clear on this one, "increase code size, but may improve performance of code that depends on fast memcpy, strlen and memset for short lengths." In other words it should definitely only be turned on, on a compile to compile basis, not for every program.
-mno-push-args: This will sometimes get you better out-of-order execution compared to using push and pop, never had any luck with it helping anything, though.
I'm surprised AMD didn't mention -maccumulate-outgoing-args, since for programs with a lot of recursion, or just many functions it almost always will speed things up. Though it will always bloat your binaries in the process.