View previous topic :: View next topic |
Author |
Message |
xiaosuo n00b
Joined: 01 Apr 2004 Posts: 47 Location: china
|
Posted: Sat Jan 01, 2005 10:27 am Post subject: [advice about portage]portage waste lots of diskspace |
|
|
today i typed:
Code: | du -sk portage/
889096 portage/
|
the portage directory waste lots of diskplace, so i think we should change the portage work function. We can learn debian, update just download the list of the packages and the depends of them. we do not need download all of the ebuilds. We just download the ebuilds we realy need to emerge and delete them when we install them successfully. I think this is not diffcult to implement. By the way , this method will be good for sync server? Don't you think so ? _________________ I love freedom
and I want to fly
so I love linux
because I love linux
I choose gentoo
Last edited by xiaosuo on Sun Jan 02, 2005 2:53 am; edited 1 time in total |
|
Back to top |
|
|
seank l33t
Joined: 08 Jul 2004 Posts: 686
|
Posted: Sat Jan 01, 2005 10:40 am Post subject: |
|
|
Code: | [~] # du -sk /usr/portage
433704 /usr/portage
[~] # |
You prolly have packages in /usr/portage/distfiles which is safe to remove, but they can come in handy sometimes. |
|
Back to top |
|
|
Zarhan l33t
Joined: 27 Feb 2004 Posts: 996
|
Posted: Sat Jan 01, 2005 11:50 am Post subject: Re: portage waste lots of diskspace |
|
|
xiaosuo wrote: | today i typed:
Code: | du -sk portage/
889096 portage/
|
the portage directory waste lots of diskplace, so i think we should change the portage work function. |
As noted by the other poster, you probably have all the source tarballs that you downloaded at portage/distfiles. You can safely remove them.
Still too much waste?
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=3&chap=5 - check out the "Excluding packages/categories". Then you can ditch stuff you don't ever need (such as i18n categories if you only use english), etc. |
|
Back to top |
|
|
jbc28 Apprentice
Joined: 07 Jan 2003 Posts: 205 Location: Edinburgh
|
Posted: Sat Jan 01, 2005 12:08 pm Post subject: |
|
|
Just downloading an ebuild list won't work without some alterations as the dependencies of an ebuild are potentially dependent on its USE flag settings.
Downloading a list of ebuilds without any further info (no dependency information) might be better: portage could then just fetch the ebuilds as needed, though it would still need to know the status of each package (x86, ~x86, -x86 etc). The problem then would be if you're left without net access but have access to precompiled binaries as I think the *.tbz files need the relevant ebuilds. Perhaps the solution to this would be to add the appropriate ebuild into the tbz, that way it works as a stand alone.
I'm tempted to agree that portage does use a lot of disk space but more importantly uses perhaps more bandwidth than is necessary. Maybe addition
of a 'fetch-ebuild' flag could fetch ebuilds as needed?
J |
|
Back to top |
|
|
Torangan Apprentice
Joined: 21 Mar 2003 Posts: 178
|
Posted: Sat Jan 01, 2005 4:09 pm Post subject: |
|
|
If you'd like to save bandwith, you'd probably better suggest a feature to use patches instead of full downloads whenever possible. That way portage could save a really great amount of bandwith. I think it should be possible to offer patches against the previous version which must reside in distfiles and then create a new tarball out of the previous version and the patch. |
|
Back to top |
|
|
xiaosuo n00b
Joined: 01 Apr 2004 Posts: 47 Location: china
|
Posted: Sat Jan 01, 2005 4:55 pm Post subject: |
|
|
i am sorry for forgeting the directory "/usr/portage/distfiles"
but you can see
Code: | xiaosuo@center portage $ du -sk distfiles
445524 distfiles
xiaosuo@center portage $ cd ..
xiaosuo@center usr $ du -sk portage
890412 portage
|
you can see the ebuilds really use lots of diskspace!
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=3&chap=5
the method of the above link is not good!
i think we should change, because our portage is growing bigger and bigger, and this really a waste of the diskspace, although we have large harddisk. Think some software use only n kbytes. _________________ I love freedom
and I want to fly
so I love linux
because I love linux
I choose gentoo |
|
Back to top |
|
|
oberyno Guru
Joined: 15 Feb 2004 Posts: 467 Location: /bin/zsh
|
Posted: Sat Jan 01, 2005 9:32 pm Post subject: |
|
|
Rsync_excludes works for me. Code: | head /etc/portage/rsync_excludes
**app-antivirus
**app-benchmarks
**app-forensics
**app-gnustep
**app-i18n
**app-laptop
**app-pda
**app-xemacs
**dev-ada
**dev-embedded | Note having ** at the beginning tell emerge --sync to ignore the metadata/categories also.
Also, you might want to think about using reiser4 for /usr/portage. For example, media-sound takes up 2980 bytes on my reiser4 partition. On my ext2 partition, it takes up 3922 bytes. This is due to reiser4 "slumming" small files into the same node. link |
|
Back to top |
|
|
xiaosuo n00b
Joined: 01 Apr 2004 Posts: 47 Location: china
|
Posted: Sun Jan 02, 2005 2:50 am Post subject: |
|
|
i use ext3. using reserfs4 is a good idea, but not the ultimate method.
if we change the portage work function. We just need to download the ebuilds's KEYWORDS SLOT USE DEPENDS ...., and we can design a good data struct for these ebuilds infomation, and use lzo(quick compress) compress this files, and we can download this instead of sync with the portage.
we do not need the portage cache because the lzo file is also the cache. this method can speed up searching and depending.
With so much benefit, WHY NOT change ? _________________ I love freedom
and I want to fly
so I love linux
because I love linux
I choose gentoo |
|
Back to top |
|
|
Zarhan l33t
Joined: 27 Feb 2004 Posts: 996
|
Posted: Sun Jan 02, 2005 12:12 pm Post subject: |
|
|
If the diskspace issue is such a big deal, then just share the /usr/portage directory over NFS (or any other network filesystem). Then you only need to host the files once for all your local computers.
Support for diffs (ie. when you have kernel 2.6.9-sources and upgrade to 2.6.10, portage downloads only the patchset, not the entire source package) will probably be in soon (there are already multiple approaches, search these forums). That should solve some of the bandwidth usage issues.
Also, if you want a compressed version, just use portage snapshots (get the tarball). Emerge-webrsync does just what you propose, it downloads a "snapshot" tarball. Maybe you can access it directly via squashfs or something instead of decompressing it, but it's there.
This is the way it has been since forever and by "forever" I'm also including FreeBSD ports tree. Granted, FreeBSD offers binary packages for far more packages and they get updated more often than 4 times a year, but still, it's a good system.
The balance on which parts to host on client (your) computer and which on the network has been decided long ago and I doubt it will change. There are guidelines, such as what kinds of files you can include in /usr/portage/<category>/<package>/files directory (small patches can be handily stored there) or when you should just put an URL into the ebuild for download. Shifting more of the stuff to the server side - well, you'll suddenly need more powerful servers.
Also, in it's current form portage is really simple, as in elegant. It's a notch up from FreeBSD's ports which is really just a collection of makefiles and handles a few things better (such as SLOTs). Point is, that the network transactions caused by portage (or ports) are really simple: Just wget a file via http from a world of mirrors. No database access, nothing special.
With a (completely) server-based model you would first get a list of packages, choose one, send it back to server, the server would have to calculate your dependencies, send the list back, you would check which ones you already have installed, request ebuilds for the rest, etc. Yes, this can be optimized a bit. Still, the load of the servers would increase - there are much more users than there are servers.
You see, even if you store the dependency info locally (the IUSE, RDEPEND lines) and can perform the dep calculations locally - it would increase complexity and server load. Right now, the only stuff that needs to be downloaded are the STANDARD source tarballs - directly from the software authors with a gazillion of mirrors. If the rest of the ebuild information has to be downloaded separately for each emerge - then you'll either need "gentoo-specific" source tarballs with the information attached - OR centralized ebuild servers. Considering that there are probably like millions of Gentoo users emerging this and that every minute, these servers would be under heavy load constantly. So there would have to be a heavy mirror infrastructure - just for Gentoo - not usable by anyone else. Rsync is much better - you just sync to it like once a week - and you can even host your own local rsync mirror for your own network.
Considering all these possibilities to squeeze portage to a smaller footprint (hosting over NFS, rsync_excludes for those games-*-categories you don't want, cleaning up your distfiles, possibility to use compressing filesystems), I think that this will not change in foreseeable future. |
|
Back to top |
|
|
tcbounce Tux's lil' helper
Joined: 18 Nov 2003 Posts: 86 Location: South Korea
|
Posted: Mon Jan 17, 2005 7:10 am Post subject: The best fix for this by far :) |
|
|
Hi,
I have 4 computers on a LAN where I live and only one of them has a portage tree I do the NFS mount things as well, but with one benefit.
If you run SELinux when writing to NFS shares (after making a package) portage will die. The fix to this is to use the experimental NFS patch and NFS tools, then your security permissions are done automatically over the network.
I have downloaded some modifications to portage that clean out the old distfiles, if the file is not current for the architectures (ARCH_KEYWORDS) I use.
Finally the /usr/portage /usr/portage/distfiles and /usr/portage/packages are not the only thing you should be sharing!
I share my /etc/portage and my portage overlays over NFS, so I can have heaps of custom ebuilds and don't have to tweak my portage settings on any other computers. It's like software installation profiles under M$, but better!
If the machines are seperated by some distance, instead of NFS you should setup a portage rsync mirror for your tree and other portage files (overlays & conf), and use portage's PKG_BINHOST feature to install the packages via FTP.
There is a nice howto about how to create a portage binary mirror. I'll post it to the forum in about 10 mins when I find it, about how to automatically clean out distfiles and packages. (or at least packages - there are other scripts for distfiles). |
|
Back to top |
|
|
ttuttle Tux's lil' helper
Joined: 03 Oct 2004 Posts: 131
|
Posted: Sun Jan 23, 2005 6:03 am Post subject: |
|
|
xiaosuo wrote: | i use ext3. using reserfs4 is a good idea, but not the ultimate method.
if we change the portage work function. We just need to download the ebuilds's KEYWORDS SLOT USE DEPENDS ...., and we can design a good data struct for these ebuilds infomation, and use lzo(quick compress) compress this files, and we can download this instead of sync with the portage.
we do not need the portage cache because the lzo file is also the cache. this method can speed up searching and depending.
With so much benefit, WHY NOT change ? |
I think you're being a little quick to suggest gigantic changes to Portage. One benefit of the current system that you completely miss is offline building. By putting all the ebuilds on the server and downloading them as needed, you can't rebuild a package without an internet connection. With the current system, however, the ebuilds and distfiles (until you delete them) are available and can be used to recompile the package at any time. |
|
Back to top |
|
|
Skinkie n00b
Joined: 13 Nov 2004 Posts: 27 Location: The Netherlands
|
Posted: Mon Aug 21, 2006 5:19 pm Post subject: Sql? |
|
|
Zarhan wrote: |
With a (completely) server-based model you would first get a list of packages, choose one, send it back to server, the server would have to calculate your dependencies, send the list back, you would check which ones you already have installed, request ebuilds for the rest, etc. Yes, this can be optimized a bit. Still, the load of the servers would increase - there are much more users than there are servers. |
Would a SQL implementation of portage be tried, if submitted? Or is there some progress on a new way of storing portage? _________________ Support Eachother, Copy Dutch Property! |
|
Back to top |
|
|
Gentree Watchman
Joined: 01 Jul 2003 Posts: 5350 Location: France, Old Europe
|
Posted: Mon Aug 21, 2006 8:56 pm Post subject: |
|
|
Quote: | Then you can ditch stuff you don't ever need (such as i18n categories if you only use english | LOL the guy is called xiaosuo he indicates he's in china and his English is ... not fluent. _________________ Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86 |
|
Back to top |
|
|
tuam l33t
Joined: 04 May 2004 Posts: 765 Location: CGN, Germany
|
Posted: Tue Aug 22, 2006 7:01 am Post subject: Re: Sql? |
|
|
Skinkie wrote: | Would a SQL implementation of portage be tried, if submitted? |
Probably not, because portage *must* work on every box. Think of broken or segfaulting libraries.
FF,
Daniel _________________ Logic clearly dictates that the needs of the many outweigh the needs of the few. - Spock
The needs of the one outweigh the needs of the many. - Kirk
I refuse to let arithmetic decide questions like that. - Picard |
|
Back to top |
|
|
Skinkie n00b
Joined: 13 Nov 2004 Posts: 27 Location: The Netherlands
|
Posted: Wed Aug 23, 2006 3:53 am Post subject: Re: Sql? |
|
|
tuam wrote: | Skinkie wrote: | Would a SQL implementation of portage be tried, if submitted? |
Probably not, because portage *must* work on every box. Think of broken or segfaulting libraries.
FF,
Daniel |
What about a broken python installation... you have dependancies one way or the other. If a major speed and/or size improvement can be made... I really would like to hear an 'official' opinion about this. _________________ Support Eachother, Copy Dutch Property! |
|
Back to top |
|
|
yngwin Retired Dev
Joined: 19 Dec 2002 Posts: 4572 Location: Suzhou, China
|
Posted: Wed Aug 23, 2006 5:58 pm Post subject: |
|
|
Code: | $ du -sk /home/portage/
196044 /home/portage/ |
That is the real size of my current portage tree. I use /home/distfiles for sources, so those are excluded. And you should know that /home is a Reiser4 partition. I don't see how such an essential part of Gentoo taking up <200MB of hard drive space would be an issue. Especially since 40GB drives are nowadays considered small.
The more important issue, IMO, is speed. Why do I need to use eix and fquery instead of the original tools? Why is calculating dependancies taking so long? Maybe emerge & friends should be rewritten in (I know it's not cool) OCaml or (shudder) C? Even (the horror!) perl would most probably be faster. _________________ "Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6051 Location: Removed by Neddy
|
Posted: Wed Aug 23, 2006 6:20 pm Post subject: |
|
|
1Gig out of say a HD that is how big?
sure it is quite "expensive" if the HD is 10gig in size but with harddrive cost really low its not to much of an issue _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
Gentree Watchman
Joined: 01 Jul 2003 Posts: 5350 Location: France, Old Europe
|
Posted: Wed Aug 23, 2006 7:03 pm Post subject: |
|
|
yes emerge -p is a bit sluggish but you will spend more time discussing how it could be improved that the sum total of all the seconds it makes you wait.
So dicussing speed is a waste of time and discussing disk usage is a waste of space.
Just about wraps this topic up. Not that the original question was not a fair one, these things should be asked.
Now where's the "stop watching this..." _________________ Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86 |
|
Back to top |
|
|
yngwin Retired Dev
Joined: 19 Dec 2002 Posts: 4572 Location: Suzhou, China
|
Posted: Thu Aug 24, 2006 11:31 am Post subject: |
|
|
Gentree wrote: | yes emerge -p is a bit sluggish but you will spend more time discussing how it could be improved that the sum total of all the seconds it makes you wait. |
I disagree. Especially when you think that any improvement would benefit all Gentoo users, not just the few that are discussing it here. _________________ "Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF |
|
Back to top |
|
|
tracker n00b
Joined: 04 Jan 2004 Posts: 7 Location: United States
|
Posted: Sat Aug 26, 2006 12:55 am Post subject: |
|
|
yngwin wrote: | That is the real size of my current portage tree. I use /home/distfiles for sources, so those are excluded. And you should know that /home is a Reiser4 partition. I don't see how such an essential part of Gentoo taking up <200MB of hard drive space would be an issue. Especially since 40GB drives are nowadays considered small. |
I see beauty in the fact that ebuilds are in plain text, and are effectively (or were, at their design) shell scripts. Utilities such as vim and less can magic files and auto-decompress them when opening them, and as a logical direction to go, I don't see why portage shouldn't be able to do the same. That would put more pressure on a nice metadata format, as dependancy resolution would require decompression for every package checked. Moving on.
Another approach would be using a plug-in based architecture for portage. You could use the standard filesystem back-end for those of us who want off-line access to the repository, or you could come up with various other interfaces. SOAP would be my recommendation for the solution xiaosuo is hinting at.
yngwin wrote: | The more important issue, IMO, is speed. Why do I need to use eix and fquery instead of the original tools? Why is calculating dependancies taking so long? Maybe emerge & friends should be rewritten in (I know it's not cool) OCaml or (shudder) C? Even (the horror!) perl would most probably be faster. |
Changing the language of implementation isn't going to solve a high-level problem, which it most likely is. Profiling emerge while it's updating world would be the first step to solving this problem, as it'll let us know just whats being slow.
And if you were going to re-implement it in a different language, do it in ruby.
-- My random thoughts, for now. _________________ --Tracker
Not to be confused with BitTorrent |
|
Back to top |
|
|
volkris n00b
Joined: 26 May 2002 Posts: 36
|
Posted: Sat Aug 26, 2006 7:53 am Post subject: |
|
|
Zarhan wrote: | Then you only need to host the files once for all your local computers. |
...and if I only have a single local computer?
Naib wrote: | 1Gig out of say a HD that is how big? |
In my case, 4. Feel free to send me a new laptop harddrive. Email me for my mailing address
Skinkie wrote: | What about a broken python installation... you have dependancies one way or the other. If a major speed and/or size improvement can be made... I really would like to hear an 'official' opinion about this. |
The official position, I believe, is that python can be the only dependency. It's sort of a "put all your eggs in one basket and watch that basket" approach. In any case, which is more likely to break, python or python + some sql backend?
In any case, portage seems like a very good candidate for an XML database rather than SQL. |
|
Back to top |
|
|
neysx Retired Dev
Joined: 27 Jan 2003 Posts: 795
|
Posted: Mon Aug 28, 2006 3:19 pm Post subject: Re: [advice about portage]portage waste lots of diskspace |
|
|
xiaosuo wrote: | today i typed:
Code: | du -sk portage/
889096 portage/ | the portage directory waste lots of diskplace |
Full portage tree, nothing excluded: Code: | # du -shx /usr/portage/
251M /usr/portage/ | An ext2 partition with 1K blocks is all it takes. |
|
Back to top |
|
|
Skinkie n00b
Joined: 13 Nov 2004 Posts: 27 Location: The Netherlands
|
Posted: Fri Sep 01, 2006 10:37 pm Post subject: Re: [advice about portage]portage waste lots of diskspace |
|
|
neysx wrote: | xiaosuo wrote: | today i typed:
Code: | du -sk portage/
889096 portage/ | the portage directory waste lots of diskplace |
Full portage tree, nothing excluded: Code: | # du -shx /usr/portage/
251M /usr/portage/ | An ext2 partition with 1K blocks is all it takes. |
In that case portage in one loopback file of 300MB could work too... _________________ Support Eachother, Copy Dutch Property! |
|
Back to top |
|
|
Gergan Penkov Veteran
Joined: 17 Jul 2004 Posts: 1464 Location: das kleinste Kuhdorf Deutschlands :)
|
Posted: Fri Sep 01, 2006 11:01 pm Post subject: |
|
|
Quote: | ls -lAF portage.sqsh
-rw------- 1 root root 42131456 2006-08-31 21:43 portage.sqsh |
_________________ "I knew when an angel whispered into my ear,
You gotta get him away, yeah
Hey little bitch!
Be glad you finally walked away or you may have not lived another day."
Godsmack |
|
Back to top |
|
|
Skinkie n00b
Joined: 13 Nov 2004 Posts: 27 Location: The Netherlands
|
Posted: Sat Sep 02, 2006 1:17 pm Post subject: |
|
|
Gergan Penkov wrote: | Quote: | ls -lAF portage.sqsh
-rw------- 1 root root 42131456 2006-08-31 21:43 portage.sqsh |
|
If you compare this filesize to the size wasted on 4k systems omg... _________________ Support Eachother, Copy Dutch Property! |
|
Back to top |
|
|
|