After the unexpectedly successful attempt to speed up 64 bit Gentoo installations using a modified approach to prelinking (see http://forums.gentoo.org/viewtopic-t-49 ... ight-.html for more details) where multple cases of considerable speed gain have been reported, I was eagerly searching for a similar solution for 32 bit systems.
I did this although it was not clear to me whether such a solution could possibly exist at all. The problem here was the limited virtual address space of the IA-32 architecture.
However, after some experimentation I found a satisfactory result: My script was able to prelink all of my shared libraries to different, unique base addresses.
I wrote that script as a wrapper around the prelink command. The script checks whether my new scheme actually can be applied, and applies it if possible.
If the new scheme does not work for a system, the script includes a fallback to the standard Gentoo prelinking method, which always works but is less efficient. It also remembers that fallback for future runs of the tool, in order to avoid futile reattempts (wasting your time). Once the fallback occurred, my script will be just another way to perform the standard Gentoo prelinking commands.
Which means you can use it anyway: Either it works better than the standard prelink approach, or it does the same.
And even in fallback mode there is a little advantage compared to using the prelink binary directly: Usually you can run the script without any arguments at all (except perhaps for -v / --verbose), because the script will use the right switches for the prelink command automatically.
Using the script, I was able to prelink successfully all shared libraries to unique base addresses on one of my boxes. On this box I use IceWM as a window manager, so there are way fewer SOs installed than on a fully-blown KDE system.
On the other (heavily bloated) box the new approach failed, and so it performed a fallback to the standard Gentoo method.
Interestingly, the new approach actually seemed to work on the bloated box also. But when I checked /etc/prelink.conf, I noticed that for some reason the kde-3.5 libraries were not among the set of search paths used by prelink!
So I added /usr/kde/3.5/lib to PRELINK_PATH in /etc/env.d/99local manually, ran env-update to propagate my changes into /etc/prelink.conf, and eventually re-ran my tool. And only then it failed.
So it seems the Gentoo devs have excluded the KDE libraries from prelinking in the standard configuration.
That is, as long as you do not manually add the KDE libraries to the prelink search path (as I did), my new prelinking scheme will typically also work with KDE systems, despite of the fact that KDE is too large for a 32-bit-processor to be prelinked: KDE will just be skipped when prelinking, as it always has been using the standard Gentoo prelinking guide.
If you actually want to prelink KDE, there are 2 choices for you:
- Purchase a 64 bit processor and recompile your system in 64 bit native mode, yielding a fully-blown 64-bit Gentoo with multi-exabyte virtual addressing space for every single process. Then you can easily add the KDE-paths to /etc/env.d/99local, and prelink will certainly never run ouf of virtual addresses.
- Or just stay with 32 bit and also add the KDE-paths to /etc/env.d/99local. But expect fallback to the rather inefficient Gentoo standard method then. On the other hand, a KDE with partial prelinking is still better than a KDE without any prelinking at all. It can never be as effective as on a 64-bit box, but a moderate speed gain can still be expected.
And here's how to use my script:
- First follow the Gentoo prelink guide (http://www.gentoo.org/doc/en/prelink-howto.xml) except for the Code listing 3.1 in section "Prelink usage".
- Add the following lines to your /etc/env.d/99local script:
Code: Select all
PRELINK_PATH="/usr/local/bin:/usr/local/sbin:/usr/local/opt:/opt" PRELINK_PATH_MASK="/opt/doc:/opt/include:/opt/info:/opt/man\ :/usr/local/opt/doc:/usr/local/opt/include:/usr/local/opt/info\ :/usr/local/opt/man" - If you have a 64-bit system and are also using KDE, add even more paths to the colon-separated list of prelink search paths in /etc/env.d/99local. If you are using kde-3.5, add the following path to PRELINK_PATH in that file: /usr/kde/3.5/lib. (Substitute 3.5 by your KDE version.) Or even better, just do an ls /usr/kde and you can see which KDE versions are actually installed. Add the lib paths for all of them to PRELINK_PATH.
- Run .
Code: Select all
env-update - Run my script prelink-system (You can find it at the end of this article). Hint: Run the script with the --help option for learning more about its options.
- Run my script again as each time your libraries might have been updated, such as after an emerge. This will prelink any new/updated libraries also.
Code: Select all
prelink --verbose
Based on the naive assumption that relocated libraries might remain writable after they have been relocated, this would then print a list of loaded libraries where prelinking was not effective.
Which means, the shorter that list is, the better.
However, I am not sure how correct this naive assumtion is. If the dynamic loader resets the memory protection for relocated code pages back to read-only after relocation, this won't work. If you know more about this, please let me know.
If you want to revert what my script has done, run the following command:
Code: Select all
prelink -au && prelink -amRNevertheless, here are both scripts:
- Listing "/usr/local/sbin/prelink-system":
Code: Select all
#!/bin/sh # # Prelinks system or updates existing prelinks. # # $HeadURL: /caches/xsvn/trunk/usr/local/sbin/prelink-system $ # $Author: root $ # $Date: 2006-09-08T20:44:13.264408Z $ # $Revision: 363 $ # # Written by Guenther Brunthaler in 2006. # Change this to anything you like. STATE_FILE="$HOME/.prelink-system.state" usage() { cat <<- "." prelink-system - apply prelinking to the local system Usage: prelink-system [ options ] options: --full, -f: Process all objects. SLOW. --incremental, -i: Process only new/updated objects. --help, -h: Display this help text. --verbose, -v: Verbose operation. --dry-run, -n: Simulated run. Shows what would be done. --quiet, -q, --silent: Don't output anything. Only the return code will indicate success or failure. No warnings will be displayed at all. prelink-system is a wrapper script for the "prelink" command. If applies the best prelink options and parameters for your system. Prelinking can make your applications start considerably faster, especially if heavy-weight desktop environments like KDE are used. prelink-system can operate in two different modes, "full" and "incremental", as specified using the command line options. The default mode is "incremental", unless if run for the first time, when "full" will be the default. In full mode, each and every library and executable within prelink's search path will be processed. This will take a long time, but will will be done thoroughly. Because of its long running time, full mode should only be used once in a month or so, or after reaching some sort of "system installation milestone", such as recompiling large parts of the system. In incremental mode, only new or updated libraries and executables within prelink's search path will be examined, and thus there may be a very small chance for doing things not as optimal as in full mode. However, incremental mode is much faster than full mode, and can therefore be run after every "emerge" installation operation or on a daily basis without performance concerns. Note: The effect of prelinking will only be as good as the settings of PRELINK_PATH and PRELINK_PATH_MASK which can be customized in your /etc/env.d/99local configuration file. I suggest adding the following lines to that configuration files, unless you know what you are doing and see a stringent reason not to do so: ---cut here--- PRELINK_PATH="/usr/local/bin:/usr/local/sbin:/usr/local/opt:/opt" PRELINK_PATH_MASK="/opt/doc:/opt/include:/opt/info:/opt/man\ :/usr/local/opt/doc:/usr/local/opt/include:/usr/local/opt/info\ :/usr/local/opt/man" ---cut here--- Don't forget to run env-update after any change to 99local, or otherwise your changes won't have any effect. Version 1.1 Written by Guenther Brunthaler in 2006. . } die() { { echo "ERROR: $*" echo "Use $0 --help for help." } >& 2 exit 1 } warn() { echo "WARNING: $*" >& 2 } seh_setup_check() { local ME; ME="$0.$$" test "$SEH_INIT" = "$ME" && return SEH_INIT="$ME"; SEH_DTORS= trap seh_process 0 } seh_process() { local DTOR TAIL while [ -n "$SEH_DTORS" ]; do TAIL="${SEH_DTORS#*:}"; DTOR="${SEH_DTORS%$TAIL}" SEH_DTORS="$TAIL"; DTOR="${DTOR%:}" "$DTOR" || warn "Clean-up function $DTOR() failed!" done } finally() { seh_setup_check; SEH_DTORS="$1:$SEH_DTORS" } load_state() { if [ ! -e "$STATE_FILE" ]; then STATE="initial" return fi STATE= read STATE < "$STATE_FILE" 2> /dev/null test -n "$STATE" || die "Failed reading state from '$STATE_FILE'!" LAST_STATE="$STATE" } save_state() { test "$STATE" = "$LAST_STATE" && return echo "$STATE" > "$STATE_FILE" || \ die "Could not write state file '$STATE_FILE'!" } inform() { test -z "$VERBOSE" && return echo $* } do_prelink() { if [ -n "$DRY" ]; then echo "# prelink $*" return fi prelink "$@" || die "Could not prelink $*" } out_of_addresses() { echo "Out of virtual address space. Full prelinking impossible." echo "(Seems it's time for you to purchase some 64 bit processor?)" echo "Reverting to inefficient standard prelinking scheme." echo "(Still better than no prelinking at all.)" inform "Undoing current prelinkage..." STATE=conservative do_prelink --undo --all inform "Prelinking the conservative standard way..." inform "(This will be remembered for future sessions.)" do_prelink --force --all --random --conserve-memory } FULL= VERBOSE= QUIET= DRY= load_state if [ "$STATE" = "initial" ]; then FULL=1; VERBOSE=1 fi COPTS= while true; do if [ -z "$COPTS" ]; then case "$1" in -?*) COPTS="$1"; shift;; *) break;; esac fi if [ "${COPTS#--}" = "$COPTS" ]; then TAIL="${COPTS#-?}"; # Switch custering. COPT="${COPTS%$TAIL}"; COPTS="${TAIL:+-}$TAIL" else COPT="$COPTS"; COPTS= fi case "$COPT" in --) break;; --help | -h) usage; exit;; --version) # Sychronize this with the usage text! echo "Version 1.1" exit;; --verbose | -v) VERBOSE=1;; --dry-run | -n) DRY=1;; --quiet | --silent | -q) QUIET=1;; --full | -f) FULL=1;; --incremental | -i) FULL='';; *) die "Unknown option '$COPT'!";; esac done test $# = 0 || die "Unexpected excess arguments: $*" if [ -n "$QUIET" -a -z "$DRY" ]; then CMD="${FULL:+--full}" exec "$0" "${CMD:---incremental}" > /dev/null 2>& 1 fi test -z "$DRY" && finally save_state if [ -n "$FULL" ]; then inform "Full system prelink phase 1 - undoing old prelinkage..." do_prelink --undo --all inform "Full system prelink phase 2 - prelinking binaries..." do_prelink --force --all --conserve-memory inform "Full system prelink phase 3 - prelinking libraries..." if prelink --force --libs-only --all --random; then STATE=flat else out_of_addresses fi else inform "\"Prelinking\" (relocating) new/updated libraries..." if [ $STATE == conservative ]; then inform "(Using low-efficiency conservative standard mode.)" do_prelink --all --random --conserve-memory else inform "(Using full-efficiency mode.)" if prelink --libs-only --all --quick --random; then STATE=flat else out_of_addresses fi fi fi inform "Prelinking complete." - Listing "/usr/local/bin/show-writable-code-segments":
Code: Select all
#! /usr/bin/perl -w # # $HeadURL: /caches/xsvn/trunk/usr/local/bin/show-writable-code-segments $ # $Author: root $ # $Date: 2006-09-06T19:21:09.348653Z $ # $Revision: 356 $ # # Written in 2006 by Guenther Brunthaler. use strict; { package File; # Create a new file handle container object. sub new { my $self= shift; $self= bless {} if !ref $self || ref $self eq __PACKAGE__; return $self; } # Close any open file. DESTROY { my $self= shift; my $fh= $self->{fh}; return unless $fh; undef $self->{fh}; close $fh or die "Could not close file '$self->{filename}': $!"; } # Open a new file for reading. # If already in use, the old handle will be closed first. # Returns the new file handle. sub open { my($self, $filename)= @_; DESTROY; local *FH; open FH, '<', $filename or die "Cannot open '$filename': $!"; $self->{filename}= $filename; return $self->{fh}= *FH{IO}; } } my($pid, $fh, $cmd); my $fobj= new File; opendir PROC, '/proc' or die $!; while (defined($_= readdir PROC)) { next unless /^(\d+)$/; $pid= $1; $fh= $fobj->open("/proc/$pid/maps"); while (defined($_= <$fh>)) { next unless / ^ [[:xdigit:]]{4,} - [[:xdigit:]]{4,} \s+ [-r] ( [-w] [-x] ) [sp] \s+ [[:xdigit:]]{4,} \s+ [[:xdigit:]]{2,} : [[:xdigit:]]{2,} \s+ \d+ \s+ (.+?) \s* $ /x; $cmd= readlink("/proc/$pid/exe") || "(process $pid)"; chomp; print "$_ PID($pid) $cmd\n" if $1 eq 'wx'; } } closedir PROC or die;
- In order for prelinking to be effective, all binaries and libraries should be assigned a different, unique address range.
- This will allow the loader to just map the code segments of those binaries and libraries into the address space of a process and running them without any changes. That is, no relocations are necessary.
- It will also allow sharing those mapped pages between processes, because it's always the same data, mapped at the same virtual address within all address spaces.
- This will enable you, for instance, to start 100 copies of your favorite KDE word processor, but there will only be a single instance of the word processor's code pages in memory, which will be shared among all those 100 instances.
- Relocation is never an issue for binaries (I am using "binaries" in this context when I am referring to executables other than shared libraries) which use SOs, just for SOs themselves. This is because all binaries are typically loaded at the same address anyway. Which will not be a problem, because initially each binary is "alone" in the address space of its process, so no address conflicts can occur at all before SOs are to be loaded.
- In contrary to 64 bit processors where it is not a problem to assign each and every binary and library in the system its unique virtual address space range, 32 bit processors have a rather limited overall virtual address space size. So it's not possible to map each binary and library to its own, unique memory range, because we will be running out of virtual addresses then.
- However, as you may have noted from what I wrote first, it is not actually necessary to assign each binary and library its own address range: It it sufficient if only all the libraries are assigned different address ranges. It does not hurt if the binaries use conflicting address ranges (among all of the binaries), as long as the do not conflict with any of the SOs. (It will not hurt because normally there is only a single binary in every address space. But there will multiple SOs be mapped into the same address space.)
- So, the first important thing my script does is relinking all objects, binaries as well as libraries, using the --conserve-memory switch of prelink. This may be suboptimal for libraries, but is good enough for binaries and will allow the binaries to share the same address space ranges. I also disable address randomization in this phase to make it more likely for each binary to get the same base address.
- After that, a second prelinking pass is performed. This time, only the libraries are relocated, and now without the --conserve-memory switch. This should assign a unique address range to each library, but leave the binaries relocated as the are. Address randomization is also enabled, because it's a nice security feature and won't hurt in this phase neither.
- If we are lucky, the total code requirements of all SOs will still fit into the 2 or 3 gigabyte address range available to 32 bit processes, without wasting precious virtual address space for also assigning unique address ranges to binaries.
- The successfulness of this approach is therefore subject to the question, whether the combined code segment sizes of all SOs in the system exceeds 2 or three gigabytes, or not.
- This might not be the case on my box, so it seems to works.
- But more testing on different systems is needed in oder to see whether this also works as well on other installations. Which is why testing is required.





