-floop-parallelize-all will probably make things worst, spliting a int i =0; i < 3 loop in 4 will be about 20x slower than just running it, if not more. Let the compiler decide if it is good or not. -lgomp is useless by the way, it will not improve things and if it override --as-needed, it will make ...