geckosenator, I was greatly surprised after looking at your run-time test results! When I run some tests that were emerging bzip2 with different CFLAGS I noticed that -Os gives the fastest run while bzipping and an average one while unbzipping. Also I noticed -O3 is the worst flag for bzip2.
It seems that any test behaves differently on different processors/architectures. For example, I have Athlon XP. AMD manual recommends to align loops at the start of 16-bytes blocks of code that means execution of empty instructions that do nothing but requires processor resources for their execution. So, in my opinion, nobody should use -O2/-O3 without at least -fno-align-labels and -fno-align-loops on Athlon XP. According to the standard GCC documentation -Os does this and something more to decrease the executable size.
It seems that any test says NOTHING about REAL program performance. ALL the tests I've ever heard about don't recommend to switch between windows, move mouse, etc while running tests so they don't take into account multitasking. But where can you see a singletasking computer (server/desktop) today? I may run a test for a DAY AND A NIGHT, then build Gentoo with the best-performance cflags (given by this test), and then I'll see that the system was very faster and more responsible before I rebuild it.
So, in my opinion, when giving a test results, one should give:
1. The test itself, or the link for it, or some information on how to reproduce the results. Or you can just add something like 'I feel my computer is faster then before'. Sometimes the latter is more useful, sometimes the reverse.
2. The hardware the test was run on (the processor, RAM, may be swap size).
3. The compiler, kernel version etc.






