Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[advice about portage]portage waste lots of diskspace
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
xiaosuo
n00b
n00b


Joined: 01 Apr 2004
Posts: 47
Location: china

PostPosted: Sat Jan 01, 2005 10:27 am    Post subject: [advice about portage]portage waste lots of diskspace Reply with quote

today i typed:
Code:
du -sk portage/
889096  portage/

the portage directory waste lots of diskplace, so i think we should change the portage work function. We can learn debian, update just download the list of the packages and the depends of them. we do not need download all of the ebuilds. We just download the ebuilds we realy need to emerge and delete them when we install them successfully. I think this is not diffcult to implement. By the way , this method will be good for sync server? Don't you think so ?
_________________
I love freedom
and I want to fly
so I love linux
because I love linux
I choose gentoo


Last edited by xiaosuo on Sun Jan 02, 2005 2:53 am; edited 1 time in total
Back to top
View user's profile Send private message
seank
l33t
l33t


Joined: 08 Jul 2004
Posts: 686

PostPosted: Sat Jan 01, 2005 10:40 am    Post subject: Reply with quote

Code:
[~] # du -sk /usr/portage
433704   /usr/portage
[~] #


You prolly have packages in /usr/portage/distfiles which is safe to remove, but they can come in handy sometimes.
Back to top
View user's profile Send private message
Zarhan
l33t
l33t


Joined: 27 Feb 2004
Posts: 924

PostPosted: Sat Jan 01, 2005 11:50 am    Post subject: Re: portage waste lots of diskspace Reply with quote

xiaosuo wrote:
today i typed:
Code:
du -sk portage/
889096  portage/

the portage directory waste lots of diskplace, so i think we should change the portage work function.


As noted by the other poster, you probably have all the source tarballs that you downloaded at portage/distfiles. You can safely remove them.

Still too much waste?

http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=3&chap=5 - check out the "Excluding packages/categories". Then you can ditch stuff you don't ever need (such as i18n categories if you only use english), etc.
Back to top
View user's profile Send private message
jbc28
Apprentice
Apprentice


Joined: 07 Jan 2003
Posts: 205
Location: Edinburgh

PostPosted: Sat Jan 01, 2005 12:08 pm    Post subject: Reply with quote

Just downloading an ebuild list won't work without some alterations as the dependencies of an ebuild are potentially dependent on its USE flag settings.
Downloading a list of ebuilds without any further info (no dependency information) might be better: portage could then just fetch the ebuilds as needed, though it would still need to know the status of each package (x86, ~x86, -x86 etc). The problem then would be if you're left without net access but have access to precompiled binaries as I think the *.tbz files need the relevant ebuilds. Perhaps the solution to this would be to add the appropriate ebuild into the tbz, that way it works as a stand alone.

I'm tempted to agree that portage does use a lot of disk space but more importantly uses perhaps more bandwidth than is necessary. Maybe addition
of a 'fetch-ebuild' flag could fetch ebuilds as needed?

J
Back to top
View user's profile Send private message
Torangan
Apprentice
Apprentice


Joined: 21 Mar 2003
Posts: 163

PostPosted: Sat Jan 01, 2005 4:09 pm    Post subject: Reply with quote

If you'd like to save bandwith, you'd probably better suggest a feature to use patches instead of full downloads whenever possible. That way portage could save a really great amount of bandwith. I think it should be possible to offer patches against the previous version which must reside in distfiles and then create a new tarball out of the previous version and the patch.
Back to top
View user's profile Send private message
xiaosuo
n00b
n00b


Joined: 01 Apr 2004
Posts: 47
Location: china

PostPosted: Sat Jan 01, 2005 4:55 pm    Post subject: Reply with quote

i am sorry for forgeting the directory "/usr/portage/distfiles"
but you can see
Code:
xiaosuo@center portage $ du -sk distfiles
445524  distfiles
xiaosuo@center portage $ cd ..
xiaosuo@center usr $ du -sk portage
890412  portage

you can see the ebuilds really use lots of diskspace!
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=3&chap=5
the method of the above link is not good!
i think we should change, because our portage is growing bigger and bigger, and this really a waste of the diskspace, although we have large harddisk. Think some software use only n kbytes.
_________________
I love freedom
and I want to fly
so I love linux
because I love linux
I choose gentoo
Back to top
View user's profile Send private message
oberyno
Guru
Guru


Joined: 15 Feb 2004
Posts: 467
Location: /bin/zsh

PostPosted: Sat Jan 01, 2005 9:32 pm    Post subject: Reply with quote

Rsync_excludes works for me.
Code:
 head  /etc/portage/rsync_excludes
**app-antivirus
**app-benchmarks
**app-forensics
**app-gnustep
**app-i18n
**app-laptop
**app-pda
**app-xemacs
**dev-ada
**dev-embedded
Note having ** at the beginning tell emerge --sync to ignore the metadata/categories also.

Also, you might want to think about using reiser4 for /usr/portage. For example, media-sound takes up 2980 bytes on my reiser4 partition. On my ext2 partition, it takes up 3922 bytes. This is due to reiser4 "slumming" small files into the same node. link
Back to top
View user's profile Send private message
xiaosuo
n00b
n00b


Joined: 01 Apr 2004
Posts: 47
Location: china

PostPosted: Sun Jan 02, 2005 2:50 am    Post subject: Reply with quote

i use ext3. using reserfs4 is a good idea, but not the ultimate method.
if we change the portage work function. We just need to download the ebuilds's KEYWORDS SLOT USE DEPENDS ...., and we can design a good data struct for these ebuilds infomation, and use lzo(quick compress) compress this files, and we can download this instead of sync with the portage.
we do not need the portage cache because the lzo file is also the cache. this method can speed up searching and depending.
With so much benefit, WHY NOT change ?
_________________
I love freedom
and I want to fly
so I love linux
because I love linux
I choose gentoo
Back to top
View user's profile Send private message
Zarhan
l33t
l33t


Joined: 27 Feb 2004
Posts: 924

PostPosted: Sun Jan 02, 2005 12:12 pm    Post subject: Reply with quote

If the diskspace issue is such a big deal, then just share the /usr/portage directory over NFS (or any other network filesystem). Then you only need to host the files once for all your local computers.

Support for diffs (ie. when you have kernel 2.6.9-sources and upgrade to 2.6.10, portage downloads only the patchset, not the entire source package) will probably be in soon (there are already multiple approaches, search these forums). That should solve some of the bandwidth usage issues.

Also, if you want a compressed version, just use portage snapshots (get the tarball). Emerge-webrsync does just what you propose, it downloads a "snapshot" tarball. Maybe you can access it directly via squashfs or something instead of decompressing it, but it's there.

This is the way it has been since forever and by "forever" I'm also including FreeBSD ports tree. Granted, FreeBSD offers binary packages for far more packages and they get updated more often than 4 times a year, but still, it's a good system.

The balance on which parts to host on client (your) computer and which on the network has been decided long ago and I doubt it will change. There are guidelines, such as what kinds of files you can include in /usr/portage/<category>/<package>/files directory (small patches can be handily stored there) or when you should just put an URL into the ebuild for download. Shifting more of the stuff to the server side - well, you'll suddenly need more powerful servers.

Also, in it's current form portage is really simple, as in elegant. It's a notch up from FreeBSD's ports which is really just a collection of makefiles and handles a few things better (such as SLOTs). Point is, that the network transactions caused by portage (or ports) are really simple: Just wget a file via http from a world of mirrors. No database access, nothing special.

With a (completely) server-based model you would first get a list of packages, choose one, send it back to server, the server would have to calculate your dependencies, send the list back, you would check which ones you already have installed, request ebuilds for the rest, etc. Yes, this can be optimized a bit. Still, the load of the servers would increase - there are much more users than there are servers.

You see, even if you store the dependency info locally (the IUSE, RDEPEND lines) and can perform the dep calculations locally - it would increase complexity and server load. Right now, the only stuff that needs to be downloaded are the STANDARD source tarballs - directly from the software authors with a gazillion of mirrors. If the rest of the ebuild information has to be downloaded separately for each emerge - then you'll either need "gentoo-specific" source tarballs with the information attached - OR centralized ebuild servers. Considering that there are probably like millions of Gentoo users emerging this and that every minute, these servers would be under heavy load constantly. So there would have to be a heavy mirror infrastructure - just for Gentoo - not usable by anyone else. Rsync is much better - you just sync to it like once a week - and you can even host your own local rsync mirror for your own network.

Considering all these possibilities to squeeze portage to a smaller footprint (hosting over NFS, rsync_excludes for those games-*-categories you don't want, cleaning up your distfiles, possibility to use compressing filesystems), I think that this will not change in foreseeable future.
Back to top
View user's profile Send private message
tcbounce
Tux's lil' helper
Tux's lil' helper


Joined: 18 Nov 2003
Posts: 85
Location: South Korea

PostPosted: Mon Jan 17, 2005 7:10 am    Post subject: The best fix for this by far :) Reply with quote

Hi,

I have 4 computers on a LAN where I live and only one of them has a portage tree :) I do the NFS mount things as well, but with one benefit.

If you run SELinux when writing to NFS shares (after making a package) portage will die. The fix to this is to use the experimental NFS patch and NFS tools, then your security permissions are done automatically over the network.

I have downloaded some modifications to portage that clean out the old distfiles, if the file is not current for the architectures (ARCH_KEYWORDS) I use.

Finally the /usr/portage /usr/portage/distfiles and /usr/portage/packages are not the only thing you should be sharing!

I share my /etc/portage and my portage overlays over NFS, so I can have heaps of custom ebuilds and don't have to tweak my portage settings on any other computers. It's like software installation profiles under M$, but better!

If the machines are seperated by some distance, instead of NFS you should setup a portage rsync mirror for your tree and other portage files (overlays & conf), and use portage's PKG_BINHOST feature to install the packages via FTP.

There is a nice howto about how to create a portage binary mirror. I'll post it to the forum in about 10 mins when I find it, about how to automatically clean out distfiles and packages. (or at least packages - there are other scripts for distfiles).
Back to top
View user's profile Send private message
ttuttle
Tux's lil' helper
Tux's lil' helper


Joined: 03 Oct 2004
Posts: 131

PostPosted: Sun Jan 23, 2005 6:03 am    Post subject: Reply with quote

xiaosuo wrote:
i use ext3. using reserfs4 is a good idea, but not the ultimate method.
if we change the portage work function. We just need to download the ebuilds's KEYWORDS SLOT USE DEPENDS ...., and we can design a good data struct for these ebuilds infomation, and use lzo(quick compress) compress this files, and we can download this instead of sync with the portage.
we do not need the portage cache because the lzo file is also the cache. this method can speed up searching and depending.
With so much benefit, WHY NOT change ?


I think you're being a little quick to suggest gigantic changes to Portage. One benefit of the current system that you completely miss is offline building. By putting all the ebuilds on the server and downloading them as needed, you can't rebuild a package without an internet connection. With the current system, however, the ebuilds and distfiles (until you delete them) are available and can be used to recompile the package at any time.
Back to top
View user's profile Send private message
Skinkie
n00b
n00b


Joined: 13 Nov 2004
Posts: 27
Location: The Netherlands

PostPosted: Mon Aug 21, 2006 5:19 pm    Post subject: Sql? Reply with quote

Zarhan wrote:

With a (completely) server-based model you would first get a list of packages, choose one, send it back to server, the server would have to calculate your dependencies, send the list back, you would check which ones you already have installed, request ebuilds for the rest, etc. Yes, this can be optimized a bit. Still, the load of the servers would increase - there are much more users than there are servers.

Would a SQL implementation of portage be tried, if submitted? Or is there some progress on a new way of storing portage?
_________________
Support Eachother, Copy Dutch Property!
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5343
Location: France, Old Europe

PostPosted: Mon Aug 21, 2006 8:56 pm    Post subject: Reply with quote

Quote:
Then you can ditch stuff you don't ever need (such as i18n categories if you only use english
LOL the guy is called xiaosuo he indicates he's in china and his English is ... not fluent. :?
_________________
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Back to top
View user's profile Send private message
tuam
l33t
l33t


Joined: 04 May 2004
Posts: 763
Location: CGN, Germany

PostPosted: Tue Aug 22, 2006 7:01 am    Post subject: Re: Sql? Reply with quote

Skinkie wrote:
Would a SQL implementation of portage be tried, if submitted?

Probably not, because portage *must* work on every box. Think of broken or segfaulting libraries.

FF,

Daniel
_________________
Logic clearly dictates that the needs of the many outweigh the needs of the few. - Spock
The needs of the one outweigh the needs of the many. - Kirk
I refuse to let arithmetic decide questions like that. - Picard
Back to top
View user's profile Send private message
Skinkie
n00b
n00b


Joined: 13 Nov 2004
Posts: 27
Location: The Netherlands

PostPosted: Wed Aug 23, 2006 3:53 am    Post subject: Re: Sql? Reply with quote

tuam wrote:
Skinkie wrote:
Would a SQL implementation of portage be tried, if submitted?

Probably not, because portage *must* work on every box. Think of broken or segfaulting libraries.

FF,

Daniel

What about a broken python installation... you have dependancies one way or the other. If a major speed and/or size improvement can be made... I really would like to hear an 'official' opinion about this.
_________________
Support Eachother, Copy Dutch Property!
Back to top
View user's profile Send private message
yngwin
Developer
Developer


Joined: 19 Dec 2002
Posts: 4542
Location: Suzhou, China

PostPosted: Wed Aug 23, 2006 5:58 pm    Post subject: Reply with quote

Code:
$ du -sk /home/portage/
196044  /home/portage/

That is the real size of my current portage tree. I use /home/distfiles for sources, so those are excluded. And you should know that /home is a Reiser4 partition. I don't see how such an essential part of Gentoo taking up <200MB of hard drive space would be an issue. Especially since 40GB drives are nowadays considered small.

The more important issue, IMO, is speed. Why do I need to use eix and fquery instead of the original tools? Why is calculating dependancies taking so long? Maybe emerge & friends should be rewritten in (I know it's not cool) OCaml or (shudder) C? Even (the horror!) perl would most probably be faster.
_________________
"Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF
Back to top
View user's profile Send private message
Naib
Advocate
Advocate


Joined: 21 May 2004
Posts: 4429
Location: Removed by Neddy

PostPosted: Wed Aug 23, 2006 6:20 pm    Post subject: Reply with quote

1Gig out of say a HD that is how big?

sure it is quite "expensive" if the HD is 10gig in size but with harddrive cost really low its not to much of an issue
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5343
Location: France, Old Europe

PostPosted: Wed Aug 23, 2006 7:03 pm    Post subject: Reply with quote

yes emerge -p is a bit sluggish but you will spend more time discussing how it could be improved that the sum total of all the seconds it makes you wait.

So dicussing speed is a waste of time and discussing disk usage is a waste of space.

Just about wraps this topic up. Not that the original question was not a fair one, these things should be asked.

Now where's the "stop watching this..." :wink:
_________________
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Back to top
View user's profile Send private message
yngwin
Developer
Developer


Joined: 19 Dec 2002
Posts: 4542
Location: Suzhou, China

PostPosted: Thu Aug 24, 2006 11:31 am    Post subject: Reply with quote

Gentree wrote:
yes emerge -p is a bit sluggish but you will spend more time discussing how it could be improved that the sum total of all the seconds it makes you wait.

I disagree. Especially when you think that any improvement would benefit all Gentoo users, not just the few that are discussing it here.
_________________
"Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF
Back to top
View user's profile Send private message
tracker
n00b
n00b


Joined: 04 Jan 2004
Posts: 7
Location: United States

PostPosted: Sat Aug 26, 2006 12:55 am    Post subject: Reply with quote

yngwin wrote:
That is the real size of my current portage tree. I use /home/distfiles for sources, so those are excluded. And you should know that /home is a Reiser4 partition. I don't see how such an essential part of Gentoo taking up <200MB of hard drive space would be an issue. Especially since 40GB drives are nowadays considered small.


I see beauty in the fact that ebuilds are in plain text, and are effectively (or were, at their design) shell scripts. Utilities such as vim and less can magic files and auto-decompress them when opening them, and as a logical direction to go, I don't see why portage shouldn't be able to do the same. That would put more pressure on a nice metadata format, as dependancy resolution would require decompression for every package checked. Moving on.

Another approach would be using a plug-in based architecture for portage. You could use the standard filesystem back-end for those of us who want off-line access to the repository, or you could come up with various other interfaces. SOAP would be my recommendation for the solution xiaosuo is hinting at.

yngwin wrote:
The more important issue, IMO, is speed. Why do I need to use eix and fquery instead of the original tools? Why is calculating dependancies taking so long? Maybe emerge & friends should be rewritten in (I know it's not cool) OCaml or (shudder) C? Even (the horror!) perl would most probably be faster.


Changing the language of implementation isn't going to solve a high-level problem, which it most likely is. Profiling emerge while it's updating world would be the first step to solving this problem, as it'll let us know just whats being slow.

And if you were going to re-implement it in a different language, do it in ruby.

-- My random thoughts, for now.
_________________
--Tracker

Not to be confused with BitTorrent
Back to top
View user's profile Send private message
volkris
n00b
n00b


Joined: 26 May 2002
Posts: 36

PostPosted: Sat Aug 26, 2006 7:53 am    Post subject: Reply with quote

Zarhan wrote:
Then you only need to host the files once for all your local computers.


...and if I only have a single local computer?

Naib wrote:
1Gig out of say a HD that is how big?


In my case, 4. Feel free to send me a new laptop harddrive. Email me for my mailing address :)

Skinkie wrote:
What about a broken python installation... you have dependancies one way or the other. If a major speed and/or size improvement can be made... I really would like to hear an 'official' opinion about this.


The official position, I believe, is that python can be the only dependency. It's sort of a "put all your eggs in one basket and watch that basket" approach. In any case, which is more likely to break, python or python + some sql backend?

In any case, portage seems like a very good candidate for an XML database rather than SQL.
Back to top
View user's profile Send private message
neysx
Retired Dev
Retired Dev


Joined: 27 Jan 2003
Posts: 795

PostPosted: Mon Aug 28, 2006 3:19 pm    Post subject: Re: [advice about portage]portage waste lots of diskspace Reply with quote

xiaosuo wrote:
today i typed:
Code:
du -sk portage/
889096  portage/
the portage directory waste lots of diskplace

Full portage tree, nothing excluded:
Code:
# du -shx /usr/portage/
251M    /usr/portage/
An ext2 partition with 1K blocks is all it takes.
Back to top
View user's profile Send private message
Skinkie
n00b
n00b


Joined: 13 Nov 2004
Posts: 27
Location: The Netherlands

PostPosted: Fri Sep 01, 2006 10:37 pm    Post subject: Re: [advice about portage]portage waste lots of diskspace Reply with quote

neysx wrote:
xiaosuo wrote:
today i typed:
Code:
du -sk portage/
889096  portage/
the portage directory waste lots of diskplace

Full portage tree, nothing excluded:
Code:
# du -shx /usr/portage/
251M    /usr/portage/
An ext2 partition with 1K blocks is all it takes.

In that case portage in one loopback file of 300MB could work too...
_________________
Support Eachother, Copy Dutch Property!
Back to top
View user's profile Send private message
Gergan Penkov
Veteran
Veteran


Joined: 17 Jul 2004
Posts: 1464
Location: das kleinste Kuhdorf Deutschlands :)

PostPosted: Fri Sep 01, 2006 11:01 pm    Post subject: Reply with quote

Quote:
ls -lAF portage.sqsh
-rw------- 1 root root 42131456 2006-08-31 21:43 portage.sqsh

_________________
"I knew when an angel whispered into my ear,
You gotta get him away, yeah
Hey little bitch!
Be glad you finally walked away or you may have not lived another day."
Godsmack
Back to top
View user's profile Send private message
Skinkie
n00b
n00b


Joined: 13 Nov 2004
Posts: 27
Location: The Netherlands

PostPosted: Sat Sep 02, 2006 1:17 pm    Post subject: Reply with quote

Gergan Penkov wrote:
Quote:
ls -lAF portage.sqsh
-rw------- 1 root root 42131456 2006-08-31 21:43 portage.sqsh

If you compare this filesize to the size wasted on 4k systems omg...
_________________
Support Eachother, Copy Dutch Property!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum