| View previous topic :: View next topic |
| Author |
Message |
drwook Veteran

Joined: 30 Mar 2005 Posts: 1322 Location: London
|
Posted: Sun Mar 25, 2012 8:38 am Post subject: Status of Graphite |
|
|
Hi all, Silly question perhaps - but does anyone know the "official line" on Graphite support? Is it still "you break it, you fix it"? I'm assuming it is
Another question - is anyone using Graphite generally, and have any objective or subjective info on stability, compatibility and performance? (i.e. what compiles but is unstable, what will fail to compile, and whether it's even in anyone's interest to use it if it does work)
Been running gcc-4.6.2 for a while without significant issue, but eyeing up 4.7.0 now too... I can only find opinion/info relating to graphite on 4.4/4.5 from searching. |
|
| Back to top |
|
 |
BoneKracker Veteran


Joined: 14 Mar 2006 Posts: 1488 Location: U.S.A.
|
Posted: Mon Mar 26, 2012 5:43 am Post subject: |
|
|
I used for months. I used on a ~x86 desktop, a hardened ~x86 server, and a hardened x86 firewall/router. I didn't notice any perceptible improvement in performance, although I did no benchmarking at all. Neither did I check to see what impact, if any, it had on compilation times. I don't recall having any problems (might have been a bug or two where applications needed to be patched). I have since removed it, but might enable it in the future. In general, I think it's a good concept. _________________ Oldthinkers unbellyfeel INGSOC.
-- Headline of a document on Winston Smith's terminal in his cubicle at the Ministry of Truth, seen briefly in the background in one scene of the movie rendition of Nineteen Eighty-Four. |
|
| Back to top |
|
 |
Ant P. Veteran

Joined: 18 Apr 2009 Posts: 1920 Location: UK
|
Posted: Mon Mar 26, 2012 6:13 pm Post subject: |
|
|
| I've compiled everything on my 3 systems with graphite cflags. Stability seems fine, at least. |
|
| Back to top |
|
 |
jtshs256 n00b

Joined: 25 Mar 2011 Posts: 17
|
Posted: Tue Mar 27, 2012 12:33 am Post subject: |
|
|
| I have used graphite for at least a year with 4.5* & 4.6*. It doesn't cause any compiling problem as well as significant performance improvement. You can enable the graphite flags globally. |
|
| Back to top |
|
 |
BoneKracker Veteran


Joined: 14 Mar 2006 Posts: 1488 Location: U.S.A.
|
Posted: Tue Mar 27, 2012 1:22 am Post subject: |
|
|
I was selective in the flags I chose to enable globally. As I recall, one or two available at the time seemed like they would too often have negative performance consequences. I would be sure to read the gcc documentation and understand what each flag actually does.
As I understood it at the time, some of these flags cause the compiler to evaluate code and selectively apply loop optimizations. However, as I recall (and I may not be accurate) one or two of the graphite-related flags available at the time seemed more ruthless, and may not be appropriate globally (i.e., they are of the same general nature as -funroll-all-loops, causing global changes that ought to be only selectively applied.
But I don't really know what I'm talking about, so take it with a grain of salt. If one is interested in performance, I would suggest one should not use graphite without first thoroughly reading the documentation of the graphite flags in the gcc manual for the version of gcc in question.
http://gcc.gnu.org/onlinedocs/ _________________ Oldthinkers unbellyfeel INGSOC.
-- Headline of a document on Winston Smith's terminal in his cubicle at the Ministry of Truth, seen briefly in the background in one scene of the movie rendition of Nineteen Eighty-Four. |
|
| Back to top |
|
 |
Apheus Apprentice

Joined: 12 Jul 2008 Posts: 182
|
Posted: Tue Mar 27, 2012 10:05 am Post subject: |
|
|
I use graphite globally on two machines for some weeks now (-fgraphite-identity, -floop-interchange, -floop-strip-mine, -floop-block), but did not do an "emerge -e world", so not all packages are recompiled yet. I do some firefox benchmarks from time to time (SunSpider, Kraken, V8, PeaceKeeper), but the numbers show nothing clear wrt CFLAGS: Upstream optimizations throughout the versions 9>10>11 seem to be more important for performance, and maybe USE=pgo.
I have excluded the most important system and toolchain packages from customized CFLAGS: libtool, glibc, gcc, coreutils, udev, openrc, sysvinit, binutils, bash, e2fsprogs. cloog-ppl is configured to build without graphite to workaround the chicken-egg problem when updating this. For grub and nvidia-drivers, I did not enable USE=custom-cflags. I noticed some screen update error with the grub default-entry countdown when built with custom CFLAGS.
Other problems so far are a few:
PyQt4 does not build
quake3 crashes when built with graphite as soon as a map is entered (~amd64 version, the stable version does not work at all)
gcc is the current stable amd64 version 4.5 |
|
| Back to top |
|
 |
codestation Tux's lil' helper


Joined: 09 Nov 2008 Posts: 126 Location: /dev/negi
|
Posted: Wed Mar 28, 2012 2:15 am Post subject: |
|
|
Since i got a new laptop, i did a clean install so all my packages have been compiled with the graphite flags since 6 months ago. I am using the current hardmasked gcc version (4.6.2) since the open bugs doesn't affect me.
This new laptop is more powerful than my old one so i don't have any performance/benchmark data, but in general i don't have problems. The only packages that failed me to compile with graphite flags are PyQt4, blender and postgresql-[base|server]. _________________ Just feel the code... |
|
| Back to top |
|
 |
BoneKracker Veteran


Joined: 14 Mar 2006 Posts: 1488 Location: U.S.A.
|
Posted: Wed Mar 28, 2012 3:34 am Post subject: |
|
|
I would very much like to see a scientifically-performed benchmarking analysis of the performance impact. _________________ Oldthinkers unbellyfeel INGSOC.
-- Headline of a document on Winston Smith's terminal in his cubicle at the Ministry of Truth, seen briefly in the background in one scene of the movie rendition of Nineteen Eighty-Four. |
|
| Back to top |
|
 |
darklegion Guru

Joined: 14 Nov 2004 Posts: 440
|
Posted: Sat Mar 31, 2012 8:26 am Post subject: |
|
|
Would be nice to see some benchmarks of LTO as well. Although as far as my experience goes, profile guided optimisation is the most useful and seems to yield around 5-10% performance boost with Wine and Dolphin. However, you can't enable this globally of course so not really useful for a full system.
-O3 or -Ofast can be useful for some programs too, but not a good idea at all to enable globally. |
|
| Back to top |
|
 |
Etal Veteran


Joined: 15 Jul 2005 Posts: 1633
|
Posted: Sat Mar 31, 2012 4:58 pm Post subject: |
|
|
If anything, with LTO you'll save a ton of space, especially with C++ applications - binaries can shrink by more than 50%.
I don't know about performance, though - never tested it. But I can't think of a way how LTO could cause it to decrease. _________________ “And even in authoritarian countries, information networks are helping people discover new facts and making governments more accountable.”– Hillary Clinton, Jan. 21, 2010 |
|
| Back to top |
|
 |
Yamakuzure l33t

Joined: 21 Jun 2006 Posts: 951 Location: Bardowick, Germany
|
Posted: Wed Apr 11, 2012 4:36 pm Post subject: |
|
|
| Apheus wrote: | | I use graphite globally on two machines for some weeks now (-fgraphite-identity, -floop-interchange, -floop-strip-mine, -floop-block) | Well, those flags do not do much, you know. They just enable slight reorganization of nested loops to reduce cache misses. (*)
The power of graphite is revealed with "-ftree-loop-distribution", "-floop-parallelize-all" and "-ftree-parallelize-loops=<number_of_threads>". Those will strip loops, nested or not, apart if their iterations do not depend on each other and carry those parts out using threads.
...I once tried that globally.
...It produced a nice automatic "fork-bomb" halting my system after 5 to 10 minutes.
AFAIR there are a few (still?) packages that are not very happy about "-fgraphite-identity", but basically the mentioned four should be safe enough. Only the loop parallelization should not be used globally. And IMHO it is a bad idea to use them on libraries that are a) used by many libs/apps and b) do multi-threading on their own.
However, this site should give you a fair impression on the current state of Graphite: http://gcc.gnu.org/wiki/Graphite
(*): For the curious:- -fgraphite-identity
Enable the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. Using -fgraphite-identity we can check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator CLooG, like index splitting and dead code elimination in loops. - -floop-interchange
Perform loop interchange transformations on loops. Interchanging two nested loops switches the inner and outer loops. For example, given a loop like:
| Code: | DO J = 1, M
DO I = 1, N
A(J, I) = A(J, I) * C
ENDDO
ENDDO | loop interchange will transform the loop as if the user had written: | Code: | DO I = 1, N
DO J = 1, M
A(J, I) = A(J, I) * C
ENDDO
ENDDO | which can be beneficial when N is larger than the caches, because in Fortran, the elements of an array are stored in memory contiguously by column, and the original loop iterates over rows, potentially creating at each access a cache miss. This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure.-floop-strip-mine
Perform loop strip mining transformations on loops. Strip mining splits a loop into two nested loops. The outer loop has strides equal to the strip size and the inner loop has strides of the original loop within a strip. The strip length can be changed using the loop-block-tile-size parameter. For example, given a loop like: | Code: | DO I = 1, N
A(I) = A(I) + C
ENDDO | loop strip mining will transform the loop as if the user had written: | Code: | DO II = 1, N, 51
DO I = II, min (II + 50, N)
A(I) = A(I) + C
ENDDO
ENDDO | This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure.-floop-block
Perform loop blocking transformations on loops. Blocking strip mines each loop in the loop nest such that the memory accesses of the element loops fit inside caches. The strip length can be changed using the loop-block-tile-size parameter. For example, given a loop like: | Code: | DO I = 1, N
DO J = 1, M
A(J, I) = B(I) + C(J)
ENDDO
ENDDO | loop blocking will transform the loop as if the user had written: | Code: | DO II = 1, N, 51
DO JJ = 1, M, 51
DO I = II, min (II + 50, N)
DO J = JJ, min (JJ + 50, M)
A(J, I) = B(I) + C(J)
ENDDO
ENDDO
ENDDO
ENDDO | which can be beneficial when M is larger than the caches, because the innermost loop will iterate over a smaller amount of data that can be kept in the caches. This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure.-ftree-loop-distribution
Perform loop distribution. This flag can improve cache performance on big loop bodies and allow further loop optimizations, like parallelization or vectorization, to take place. For example, the loop | Code: | DO I = 1, N
A(I) = B(I) + C
D(I) = E(I) * F
ENDDO | is transformed to | Code: | DO I = 1, N
A(I) = B(I) + C
ENDDO
DO I = 1, N
D(I) = E(I) * F
ENDDO | -floop-parallelize-all
Use the Graphite data dependence analysis to identify loops that can be parallelized. Parallelize all the loops that can be analyzed to not contain loop carried dependences without checking that it is profitable to parallelize the loops.-ftree-parallelize-loops=n
Parallelize loops, i.e., split their iteration space to run in n threads. This is only possible for loops whose iterations are independent and can be arbitrarily reordered. The optimization is only profitable on multiprocessor machines, for loops that are CPU-intensive, rather than constrained e.g. by memory bandwidth. This option implies -pthread, and thus is only supported on targets that have support for -pthread. _________________ I *do* know that I easily aggravate people due to my condensed writing. Rule of thumb: If I wrote anything that can be understood in two different ways, and one way offends you, then I meant the other!  |
|
| Back to top |
|
 |
Yamakuzure l33t

Joined: 21 Jun 2006 Posts: 951 Location: Bardowick, Germany
|
Posted: Fri Apr 13, 2012 10:20 am Post subject: |
|
|
| Etal wrote: | If anything, with LTO you'll save a ton of space, especially with C++ applications - binaries can shrink by more than 50%.
I don't know about performance, though - never tested it. But I can't think of a way how LTO could cause it to decrease. | Didn't see this earlier: If you are using LTO, you should be aware of the warning portage gives you after the merge of gcc: | gcc-ebuilds wrote: | * LTO support is still experimental and unstable
* Any bugs resulting from the use of LTO will not be fixed. |
_________________ I *do* know that I easily aggravate people due to my condensed writing. Rule of thumb: If I wrote anything that can be understood in two different ways, and one way offends you, then I meant the other!  |
|
| Back to top |
|
 |
depontius Advocate

Joined: 05 May 2004 Posts: 2156
|
Posted: Fri Apr 13, 2012 12:28 pm Post subject: |
|
|
Pardon me please, but can someone give a simple definition of Graphite?
From what I can tell on this thread, it seems to be a separate set or class of gcc optimizations. It also sounds almost as if it's being separately developed, and then grafted on. I've done only a little searching, but haven't found a concise definition, just some low-level, gritty "you have to know the answer to get the answer" kind of stuff. _________________ .sigs waste space and bandwidth |
|
| Back to top |
|
 |
Etal Veteran


Joined: 15 Jul 2005 Posts: 1633
|
Posted: Fri Apr 13, 2012 1:22 pm Post subject: |
|
|
| Yamakuzure wrote: | | Etal wrote: | If anything, with LTO you'll save a ton of space, especially with C++ applications - binaries can shrink by more than 50%.
I don't know about performance, though - never tested it. But I can't think of a way how LTO could cause it to decrease. | Didn't see this earlier: If you are using LTO, you should be aware of the warning portage gives you after the merge of gcc: | gcc-ebuilds wrote: | * LTO support is still experimental and unstable
* Any bugs resulting from the use of LTO will not be fixed. |
|
I know, but I was responding to the poster above me  _________________ “And even in authoritarian countries, information networks are helping people discover new facts and making governments more accountable.”– Hillary Clinton, Jan. 21, 2010 |
|
| Back to top |
|
 |
Yamakuzure l33t

Joined: 21 Jun 2006 Posts: 951 Location: Bardowick, Germany
|
Posted: Fri Apr 13, 2012 1:50 pm Post subject: |
|
|
| depontius wrote: | Pardon me please, but can someone give a simple definition of Graphite?
From what I can tell on this thread, it seems to be a separate set or class of gcc optimizations. It also sounds almost as if it's being separately developed, and then grafted on. I've done only a little searching, but haven't found a concise definition, just some low-level, gritty "you have to know the answer to get the answer" kind of stuff. | The link I posted explains it pretty well. | gcc.gnu.org/wiki/Graphite wrote: | | Graphite is a framework for high-level memory optimizations using the polyhedral model. | That says it all, unless you have no clue what "polyhedral model" means. The shortest explanation can be found in wikipedia: | http://en.wikipedia.org/wiki/Polyhedral_model wrote: | | The polyhedral model (also called the polytope method) is a mathematical framework for loop nest optimization in program optimization. The polytope method treats each loop iteration within nested loops as lattice points inside mathematical objects called polytopes, performs affine transformations or more general non-affine transformations such as tiling on the polytopes, and then converts the transformed polytopes into equivalent, but optimized (depending on targeted optimization goal), loop nests through polyhedra scanning. | So in short: Graphite enables gcc to optimize loops in a memory friendly way and can (if wanted) split them to be iterated using parallel threads. This (hopefully) optimizes performance by a) parallelization and b) fewer cache misses. _________________ I *do* know that I easily aggravate people due to my condensed writing. Rule of thumb: If I wrote anything that can be understood in two different ways, and one way offends you, then I meant the other!  |
|
| Back to top |
|
 |
depontius Advocate

Joined: 05 May 2004 Posts: 2156
|
Posted: Fri Apr 13, 2012 5:52 pm Post subject: |
|
|
| Yamakuzure wrote: | | depontius wrote: | Pardon me please, but can someone give a simple definition of Graphite?
From what I can tell on this thread, it seems to be a separate set or class of gcc optimizations. It also sounds almost as if it's being separately developed, and then grafted on. I've done only a little searching, but haven't found a concise definition, just some low-level, gritty "you have to know the answer to get the answer" kind of stuff. | The link I posted explains it pretty well. | gcc.gnu.org/wiki/Graphite wrote: | | Graphite is a framework for high-level memory optimizations using the polyhedral model. | That says it all, unless you have no clue what "polyhedral model" means. |
That says it all. This is the first time I've ever heard of the "polyhedral model." Of course I know what a polyhedron is. I've never formally taken graph theory, but I'm somewhat familiar with the concept of mapping things into edges, vertices, etc.
| Yamakuzure wrote: | The shortest explanation can be found in wikipedia: | http://en.wikipedia.org/wiki/Polyhedral_model wrote: | | The polyhedral model (also called the polytope method) is a mathematical framework for loop nest optimization in program optimization. The polytope method treats each loop iteration within nested loops as lattice points inside mathematical objects called polytopes, performs affine transformations or more general non-affine transformations such as tiling on the polytopes, and then converts the transformed polytopes into equivalent, but optimized (depending on targeted optimization goal), loop nests through polyhedra scanning. | So in short: Graphite enables gcc to optimize loops in a memory friendly way and can (if wanted) split them to be iterated using parallel threads. This (hopefully) optimizes performance by a) parallelization and b) fewer cache misses. |
This is also the first time I've ever heard the word "polytope". I haven't hit your second link yet, but once I see the word "tiling" it sounds almost as if you're using the faces of the polyhedron as well as the edges and vertices.
At the very least, I have some interesting links to follow. _________________ .sigs waste space and bandwidth |
|
| Back to top |
|
 |
Apheus Apprentice

Joined: 12 Jul 2008 Posts: 182
|
Posted: Mon Jun 11, 2012 8:36 pm Post subject: |
|
|
| I have found that rekonq (and konqueror with kwebkitpart) crash on most javascript-using websites if qt-webkit is compiled with the graphite flags. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|