View previous topic :: View next topic |
Author |
Message |
dpaddy Tux's lil' helper
Joined: 25 Jun 2008 Posts: 142
|
Posted: Thu Apr 20, 2017 8:49 pm Post subject: [SOLVED] Is amd64 no-multilib faster? |
|
|
I have made so many mistakes with an install and backtracted so many times starting over to fix them that I by now have insufficient grit to do the experiment, so perhaps someone knows the answer to the following...
Scenerio A: I use the default multilib profile and proceed with a gentoo install
Scenerio B: I use the no-multilib profile and proceed with a gentoo install
If I compile the same C code (64bit integers and doubles) with gcc using same version and compile flags, should I expect the executable runs faster in scenerio B?
Please: no opinions... I'm looking for facts (of the sort proved by actual experience).
The kernel matters more than I would have thought: CONFIG_TRANSPARENT_HUGEPAGE=y makes some code Code: |
#include <stdio.h>
#include <stdlib.h>
#define B 10000000
/*
#define W 1600
#define H 1600
*/
#define W 100
#define H 100
unsigned char M[H][W];
long long **P,*A[H];
void save()
{
int i,j; FILE *f = fopen("ganesh.out","w");
// fprintf(f,"%d\n%d\n",W,H);
for (i = 0; i < H; i++)
for (j = -W/2; j < W/2; j++){
fwrite(&A[i][j],sizeof(long long),1,f);
}
fclose(f);
}
int main(int argc, char *argv[])
{
int i,j,k; double a,b,w,x,y,z;
for (i = 0; i < H; i++){
A[i] = (long long *)malloc(W*sizeof(long long));
for (j = 0; j < W; j++) A[i][j] = 0;
A[i] += (W/2);
}
P = A+(H/2);
for (i = 0; i < H; i++){
// printf("%d\n",i);
for (j = 0; j < W; j++){
M[i][j] = k = 0;
a = (i-(H/2.))/(H/4.); b = (j-(W/2.))/(W/4.);
x = y = w = z = 0.;
do {
if (k++ == B){ M[i][j] = 1; break; }
y = 2*y*x + b;
x = w - z + a;
w = x*x;
z = y*y;
} while (w+z < 4.);
}
}
while (i--){
// printf("%d\n",i);
for (j = 0; j < W; j++){
if (M[i][j]) continue;
k = 0;
a = (i-(H/2.))/(H/4.); b = (j-(W/2.))/(W/4.);
x = y = w = z = 0.;
n:
y = 2*y*x + b;
x = w - z + a;
w = x*x;
z = y*y;
if (w+z < 4.){
P[(int)(x*(H/4.))][(int)(y*(W/4.))]++;
if (++k < B) goto n;
}
}
}
save();
}
// gcc -O3 -pipe -march=native -o ganesh ganesh.c | run 3% SLOWER (on a Ryzen 1700, 64GB DDR4 2133) ... so I turned that off (yes I read the caution about the oom killer). I can get another 3% gain with other kernel settings (which I accidently bumped into a few days ago, but have not since been able to remember/find; however, currently CONFIG_SCHED_OMIT_FRAME_POINTER=y, CONFIG_CC_OPTIMIZE_FOR_SIZE is not set).
Whether the code is shit is beside the point; the more important issue is whether kernel/profile/installation issues might (all else being equal -- including hardware and compiler) make a greater than 1% difference... if so, then what is recommended for speed (details matter -- they nearly always do -- but are there good general rules of thumb)?
Last edited by dpaddy on Thu Apr 20, 2017 10:06 pm; edited 1 time in total |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54216 Location: 56N 3W
|
Posted: Thu Apr 20, 2017 9:38 pm Post subject: |
|
|
dpaddy,
The multilib/no-multilib choice makes no difference to code execution time.
With no-multilib, only 64 bit code is built.
With multilib, you have the same 64 bit code as above but 32 bit code is also built as required. This extends the build time.
e.g. You will build glibc twice with the multilib profile.
The down side to no-multilib is that you can't run 32 bit code.
Code: | // gcc -O3 -pipe -march=native -o ganesh ganesh.c |
-O3 is scary. It makes the code bigger and in some cases makes it slower, as the working set no longer fits in the cache.
-march=native is not the best choice for Ryzen yet.
gcc-6.3 interprets -march=native correctly for Ryzen but a few other settings produce faster code, mostly because the optimization foc Ryzen in gcc is far from complete.
The code looks like it started life as fortran. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
dpaddy Tux's lil' helper
Joined: 25 Jun 2008 Posts: 142
|
Posted: Thu Apr 20, 2017 10:04 pm Post subject: |
|
|
> The code looks like it started life as fortran.
Thats because I did |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Fri Apr 21, 2017 12:19 am Post subject: Re: [SOLVED] Is amd64 no-multilib faster? |
|
|
dpaddy wrote: | The kernel matters more than I would have thought: CONFIG_TRANSPARENT_HUGEPAGE=y makes some code [...] run 3% SLOWER (on a Ryzen 1700, 64GB DDR4 2133) ... so I turned that off |
dpaddy ... you could also set CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y (rather than *_ALWAYS) and so only have code that advises the use of transparent hugepage use it (and so have code that doesn't uneffected).
dpaddy wrote: | I can get another 3% gain with other kernel settings (which I accidently bumped into a few days ago, but have not since been able to remember/find; however, currently CONFIG_SCHED_OMIT_FRAME_POINTER=y, CONFIG_CC_OPTIMIZE_FOR_SIZE is not set). |
In my experience, if you want to improve speed then you should opt for a libc other than glibc (ie, musl). I don't remember exactly, but I have a feeling we've had this conversation before, perhaps re uClibc?
best ... khay |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|