savalas n00b
Joined: 10 Dec 2009 Posts: 40
|
Posted: Mon Mar 22, 2010 6:05 pm Post subject: Profile-guided optimization: how, and what? |
|
|
Out of curiosity, I'm currently interested in profile-guided optimization. I've read about PGO for years but until recently I had never tried it so I'm not sure what would benefit the most from PGO in a desktop environment and where the low-hanging fruits are, any ideas?
PGO is really good at skewing benchmarks. You can profile an application for a specific benchmark and run it super-duper-faster (around 10% faster I guess) but doing so would probably also make that application slower in other real-world situations. And that's the irony in PGO. If you profile your application during benchmarking, you'll have some very quantifiable improvements but you'll have no idea whether it actually improve your actual usage. And if you profile during actual usage (the correct way to profile) then benchmark numbers may very well go down even if the overall performance increase.
Anyway, I've titled this thread "how, and what?" -- I may not be sure about the "what", but I can at least give you the "how". GCC's profiling will create .gcda and .gcno files in the source tree, which doesn't sit well with Portage use, as Portage will wipe the temp dir after compilation. So what I do is I use a separate dir to store those files, /var/tmp/pgo in my case. This is my how-to:
- compile the target package with the extra CFLAGS/CXXFLAGS -fprofile-generate=/var/tmp/pgo
- use the application for a few hours or a day
- reboot or make sure that everything running that application is shut down so that statistics are committed to the disk
- recompile the target package with the extra CFLAGS/CXXFLAGS -fprofile-use=/var/tmp/pgo
Writing this, I realized that I didn't add any LDFLAGS, although I'm almost certain I was supposed to If anyone has info about that, please feel free.
A few observations:
- don't expect OMG FAST performance improvements. All the benchmarks and articles I've read on the subject point towards single-digit percentages. If you're looking for raw performance, overclock your CPU by 50-100 Mhz for the same result
- apparently some packages require their dependencies to be compiled with profiling enabled as well. I can emerge xorg-server and mesa with profile support but they crash with error messages indicating missing profiling tools in their undelying libs
- keep track of what you've built with -fprofile-generate. You don't want to be running stuff in profiling mode if you don't have to, even if the performance hit is barely measurable
- here are the packages I have profiled: xf86-video-ati libXft libdrm libXcomposite libXdamage libXrender pixman. I think I should probably focus on libs that are used the most, they seem like the best target for PGO
That's it, that's everything I know about PGO. Feel free to add your own experience or suggestions. |
|