Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Do backup and restore, so why not optimize too?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 3:06 am    Post subject: Do backup and restore, so why not optimize too? Reply with quote

Very cool script. Lets you save your entire Gentoo installation to a gzip file. Then you can turn around and use this gzip file to perform a stage 4 install to another machine. I love you guys.

I'm going to use this to backup the Gentoo installation on my laptop, install it on my desktop, and use that to do a Jackass build...
Code:
emerge -e system
emerge -e world

...because my desktop is so much faster. Then I create a new stage4, move it to my laptop, and reinstall. I figure it will save me days.

This is kind of a big deal for me, so I'm preflighting the whole thing because dammit the Gentoo installation is perfect the way it is and I don't want to screw it up (but at the same time I want all the new goodness emerge --sync teases me with each new day.)

So I'm thinking it through, and then it occurs to me, maybe I can make the Gentoo install even faster.

When you do the stage 4 install, really you're just untar'ing a file. The files get written to the disk in the order they were tar'ed (at least I think that's right.) What I want to do is figure out a way of tar'ing these files in the order in which they'll be used.

In other words, when I boot my laptop, I want the kernel to be the first file on the drive. The first file the kernel loads I want to be the second file on the drive, that is, immediately adjacent to the kernel file. The second file the kernel loads I want next to the first file. And so on. I want this for Firefox too. And emacs. And especially Gnome.

I don't have any hard data but I have to believe that this will speed up load times by an order of magnitude. You're making better use of HD cache and reducing seek times, simultaneously, right?

When you do a Gentoo install the traditional way, you can't pick where each file goes on the HD. But with a stage 4 install, you can. Why not exploit this?

Why don't I exploit it? Because I don't how to do this. I have to believe you guys do though. It seems to me what is required is some means of recording which files are accessed and in what order during a typical Gentoo session. This information has to be saved someplace, and then utilized by a tar replacement/enhancement when the above-linked script creates the stage 4.

Of course over time the gains start to degrade as you do new emerges and you rebuild stuff, but the gains don't completely disappear, and besides, you can always repeat the process. Indeed, this might be just the thing to get some of us to make more frequent backups: the promise of a faster system each time we do.

I either betting somebody already thought of this, or I'm crazy. Or both.

Comments?
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 4:24 am    Post subject: Reply with quote

I just do something like this from my slow laptops (where FASTDESKTOP is the hostname or IP to use as a compile slave, and /some/path is a convenient place on it with lots of free space)
Code:
rsync -ax --progress / username@FASTDESKTOP:/some/path

Then on the desktop
Code:
mount --bind /proc /some/path/proc
HOME=/root chroot /some/path /bin/bash
emerge --sync; emerge -uDNf world >&/dev/null & emerge -uDNav world

Then, when it's done, boot the laptop from a LiveCD, mount the hard drive to e.g. /some/path2, and run (from the laptop):
Code:
rsync -ax --progress username@FASTDESKTOP:/some/path /some/path2

No need to mess with gzips or use fancy scripts, and rsync doesn't re-copy files that haven't changed. And it is much, much faster than distcc.

Another option instead of copying stuff over with rsync is to simply boot the slow machine from a livecd, share the gentoo installation's root directory over NFS or SHFS, and then chroot into it on the server and compile things in-place. Matter of taste or whether you have enough free space on the desktop for a full rsync, I suppose.

That stage4 page you link to is mostly useful for backups, though I still prefer using rsync or `rsnapshot' even for that purpose.


About your pre-load idea... I don't know how that would be accomplished. But it wouldn't really matter where the files are stored on the hard drive.
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?


Last edited by Dlareh on Thu Sep 08, 2005 11:33 pm; edited 2 times in total
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 4:39 am    Post subject: Reply with quote

Interesting.

I would point out though that you don't appear to have optimization opportunities (of the type describing in my previous post) with this method.

That said, if I don't ever get optimization working, rsync appears to make more sense. I especially like the fact that it doesn't require me to screw around with the kernel at all.

If there is a way to get optimization working, I'm going to go with that. Well, at least once. See how it works. See what the gains are, if any.

Hey, it's Gentoo. Anything for a little more performance, right?
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 4:41 am    Post subject: Reply with quote

nokilli wrote:
I would point out though that you don't appear to have optimization opportunities (of the type describing in my previous post) with this

Yeah I just read the rest of your post. Pre-loading would be nice, but it wouldn't matter where the files are on the hard drive. Just let the filesystem take care of accessing them in the normal fashion.

The trick is how to get the kernel to pre-load the files into cache. I don't know how this would be done.

About pre-loading emacs, why not just set your login shell to emacs? That monster is pretty much an operating system all on its own :)
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 5:01 am    Post subject: Reply with quote

Dlareh wrote:
Pre-loading would be nice, but it wouldn't matter where the files are on the hard drive. Just let the filesystem take care of accessing them in the normal fashion.

I think it does matter. You've got the hard drive which caches entire tracks, right? So if you have a bunch of little files that all fit on a track by storing them adjacent to one another you turn what could be, say, a hundred disk accesses into just one.

And then there are the seek times. The seek time is the interval required to move the little arm inside the hard drive from one track to the next, right? So doesn't this also get dramatically reduced if files accessed consecutively are adjacent to each other on the disk?

For really fast drives, esp. RAID, I suppose the gains would be marginal. But my laptop is using a 5400RPM drive that uses the ATA/66 interface. Every little drop of performance that can be slurped off of this thing would be appreciated.

Anyways it be neat to give it a try I think. This is Gentoo GNU/Linux. Being able to try such wacky things is one of the reasons I'm drawn to this distro/OS.
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 5:13 am    Post subject: Reply with quote

It's a nice thought, but a moot point without first getting the kernel to preload files.
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 5:40 am    Post subject: Reply with quote

Why does the kernel have to preload anything?

The files are adjacent. The kernel is loaded, and itself begins to load files. Files that are adjacent to each other on the disk. It loads the first file. That in effect loads all of the other files within that same track into the hard drive's cache. When the kernel goes to load the next file, it's already in the hard drive's cache. No seek is required. No disk access at all. In other words, there's no wait, beyond that imposed by the bus.

Let's also consider now that all of these files are contiguous, i.e., most of the time the blocks that make up that file sit on the same track. That has to speed things up in and of itself.

Getting the kernel to preload would speed things up even more I guess, I hadn't considered that. But it isn't the only game being played here.

That said, what would be the difficulty in making a simple hack to get the kernel to preload just once during boot? Create a new config option that lets you set the number of blocks at the front of the drive to cache on startup, the idea being that because we've carefully laid out how the files are stored on disk, these will be the files accessed during the ramp up to runlevel 5.

When you create the initial profile that records which files are accessed during startup you can also probably obtain the optimal value for the number of blocks to preload. You know where they are, they're at the front of the disk. It would be a one-shot deal on the part of the kernel, something it could do during initialization, then forget about, and it could conceivably perform the preloading for all of the apps you run at startup, e.g., Gnome, emacs, Firefox, etc.
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 5:55 am    Post subject: Reply with quote

By preload I mean "put into cache" so it doesn't have to be loaded from disk.

Whether or not the files are contiguous wouldn't necessarily matter (it might help, but it wouldn't be noticeable to the user)
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 6:34 am    Post subject: Reply with quote

Dlareh wrote:
Whether or not the files are contiguous wouldn't necessarily matter (it might help, but it wouldn't be noticeable to the user)

We'll just have to disagree.

I mean, if what you're saying is true, then the whole notion of random access being more expensive than sequential access can't be true. To say nothing of seek times. I mean, why even care about seek times if they don't matter?

And then why do so many users work to defragment their hard drives? If what you're saying is true, they're just wasting their time. But I know that isn't so, because I've experienced performance gains when defragmenting my own hard drive.

So let's just agree to disagree.
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 6:41 am    Post subject: Reply with quote

defragmenting is one thing, and important

making sure a bunch of arbitrary files are contiguous TO EACH OTHER is quite another, and much less useful
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 7:05 am    Post subject: Reply with quote

Defragmenting is important because it makes all of the blocks that make up a file adjacent to one another. That's how defragmentation provides an increase in hard disk performance.

Why wouldn't the same benefit be realized if the files themselves were adjacent to one another?

Consider it a rhetorical question.
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 7:20 am    Post subject: Reply with quote

nokilli wrote:
Defragmenting is important because it makes all of the blocks that make up a file adjacent to one another. That's how defragmentation provides an increase in hard disk performance.

Why wouldn't the same benefit be realized if the files themselves were adjacent to one another?

Because they will never be read in that order, and even so it's much more useful to have all files sorted alphabetically by directory
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 7:25 am    Post subject: Reply with quote

OK I'm going to suggest that we let somebody else comment now.
Back to top
View user's profile Send private message
nevynxxx
Veteran
Veteran


Joined: 12 Nov 2003
Posts: 1123
Location: Manchester - UK

PostPosted: Fri Sep 02, 2005 7:45 am    Post subject: Reply with quote

Dlareh wrote:
nokilli wrote:
Defragmenting is important because it makes all of the blocks that make up a file adjacent to one another. That's how defragmentation provides an increase in hard disk performance.

Why wouldn't the same benefit be realized if the files themselves were adjacent to one another?

Because they will never be read in that order, and even so it's much more useful to have all files sorted alphabetically by directory


nokilli's entire point is that they will be put in the order they are read on startup. So yes, they will be read in that order, on every boot.

If you are talking a laptop, that's quite often, and with poor average seektimes, it could give a performace boot.

Theoretically it sounds like a reasonable way to increase boot times. But it seems a lot of effort to do.

@Dlareh. It is a fairly simple task, just work out the correct order, and dd each file in the boot process individually. Only bother up to the point of the system being booted though, to either graphical or cli shell. After that you really won't get any more gain.

Also, when things like init files change, if they grow, they could always move....
_________________
My Public Key

Wanted: Instructor in the art of Bowyery
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 7:54 am    Post subject: Reply with quote

ah, I was thinking of cp'ing to /dev/null, but dd is probably the way to go.

What I'm going to do when I find the time is

1) disable noatime on my filesystems
2) reboot, load every single program I use frequently* ( in the order I most frequently use them)
3) run a find that will identify all these recently accessed files that have not been modified within the past, say, 2 days
4) add those files to a massive dd list, as you suggest
5) re-enable noatime

* I should probably make a script for this so I can do time comparisons with and without the dd. should be interesting.
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 8:24 am    Post subject: Reply with quote

nevynxxx wrote:
Theoretically it sounds like a reasonable way to increase boot times. But it seems a lot of effort to do.

You mean like, creating a distro that lets people compile their GNU/Linux systems from source? :)

Look, I'd be happy to do the work. I just need a lead on how to record when a file is being opened. That's it. Given that, I can do all of this with a fairly terse Python script.

Unless we're talking about the kernel hack that loads x number of blocks at the start of the disk into memory. I'm pretty sure we'd need to do c with that. :)

The hack is a low priority though. I don't think it will give us nearly the speed gains as this optimization thing I'm talking about would.

So to move the conversation forward, let's hear about ideas as to how we get to know when files are being opened. That has to be another kernel hack? Can't be... I mean, it's too useful a capability... I have to believe somebody has already crafted this hook and gives the functionality to us in userspace.

Where is it?
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 8:31 am    Post subject: Reply with quote

nokilli wrote:
So to move the conversation forward, let's hear about ideas as to how we get to know when files are being opened. That has to be another kernel hack? Can't be... I mean, it's too useful a capability... I have to believe somebody has already crafted this hook and gives the functionality to us in userspace.

Where is it?

You don't think atime will be good enough?
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 9:09 am    Post subject: Reply with quote

Dlareh wrote:
You don't think atime will be good enough?

You know, I dismissed atime at first, but if that's all we have, then let's see what we can do with it.

My concerns with atime of course are that it doesn't record the order in which files are accessed -- which is what we want -- but rather only when the file was last accessed. So for instance, a file that is accessed very early in the boot cycle, but then is again accessed much later would get recorded as being accessed late in the boot cycle, and so wouldn't be optimally placed on the hard drive.

But maybe I'm being too anal with this. How many files are we talking about here that would suffer this? Probably not many, right?

Certainly, version 1 could use atimes, and then we could see where we are.

So some crude pseudocode for this would be as follows:

- Somehow record the time at which the computer was booted.

- Provide a facility by which the user could invoke a process that would then catalogue all files by their atime.

- Reprocess catalog so that files with atimes occurring after the boot appear at the beginning, sorted by atime, and files with atimes occurring before the boot (i.e., files we don't care about caching) appear at the end of the catalogue.

- Devise a way of using this catalogue -- this list -- to control the order in which files are added to a tar archive.

And that would be it?
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 9:15 am    Post subject: Reply with quote

nevynxxx wrote:
It is a fairly simple task, just work out the correct order, and dd each file in the boot process individually.

OK this confuses me. How does dd apply here? I know it's an incredibly powerful utility and all, but how does it let us arrange the order in which files are laid out on the hd?

I've got it in my head that all of the files have to be archived somehow, the partition given a clean fs, and then the files read back from the archive in order to achieve the proper ordering.

dd somehow lets us circumvent this? How can that be?
Back to top
View user's profile Send private message
Dlareh
Advocate
Advocate


Joined: 06 Aug 2005
Posts: 2102

PostPosted: Fri Sep 02, 2005 9:30 am    Post subject: Reply with quote

dd would be used to pre-read the files so they (hopefully) reside in cache
_________________
"Mr Thomas Edison has been up on the two previous nights discovering 'a bug' in his phonograph." --Pall Mall Gazette (1889)
Are we THERE yet?
Back to top
View user's profile Send private message
nevynxxx
Veteran
Veteran


Joined: 12 Nov 2003
Posts: 1123
Location: Manchester - UK

PostPosted: Fri Sep 02, 2005 10:58 am    Post subject: Reply with quote

nokilli wrote:
nevynxxx wrote:
It is a fairly simple task, just work out the correct order, and dd each file in the boot process individually.

OK this confuses me. How does dd apply here? I know it's an incredibly powerful utility and all, but how does it let us arrange the order in which files are laid out on the hd?

I've got it in my head that all of the files have to be archived somehow, the partition given a clean fs, and then the files read back from the archive in order to achieve the proper ordering.

dd somehow lets us circumvent this? How can that be?


I may be wrong, but as far as I remember, with dd you can copy data to an arbitary place, i.e. you can do the actualy ordering on the hard drive that the OP wants easier.

cp may also work.

As for working out how often you open things. I say don't bother, it isn't going to get you much.

Consider that most apps will have data files and or cache files. These will constantly change in size(depending on the app) which means they *cannot* stay physically local.

You can speed up boot like this, but not actual working speed.
_________________
My Public Key

Wanted: Instructor in the art of Bowyery
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 195

PostPosted: Fri Sep 02, 2005 5:26 pm    Post subject: Reply with quote

nevynxxx wrote:
with dd you can copy data to an arbitary place


This strikes me as dangerous. How do you know you're not overwriting another file? Or are we talking about doing this to a clean filesystem? Then I wouldn't see why cp wouldn't work equally well. It doesn't matter exactly where the files reside, but only that consecutively accessed files reside adjacent to one another. I can't believe cp would copy one file to one part of the disk and then when copying the next file place it somewhere across the platter, not routinely anyways.

So what I'm getting out of this is that rsync for my purposes won't work. Existing files preclude the opportunities to optimize the drive's contents.

I will want to go with the stage 4 script I initially talked about. All I have to do is to some how control the order in which files are added to the tar archive according to their atimes.

If I make this work I will come back and post some benchmarks.
Back to top
View user's profile Send private message
nevynxxx
Veteran
Veteran


Joined: 12 Nov 2003
Posts: 1123
Location: Manchester - UK

PostPosted: Mon Sep 05, 2005 9:00 am    Post subject: Reply with quote

nevynxxx wrote:
cp may also work.


Please don't quote and then make points that I also made after the quote.

I would assume you were doing this to a clean system, or rather, I would assume you wanted to ignore anything that was already on the file system.

Also, as I said in the origional post, if you were to use dd for this, you would have to be *very* careful and take a lot of time over it.

It would ensure that the info was put where you wanted though, and that cp didn't (not that I'm saying it would, but would you know if it didn't?) put the files next to each other.
_________________
My Public Key

Wanted: Instructor in the art of Bowyery
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum