Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Avoiding Recompilation
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Sun Aug 03, 2014 2:35 pm    Post subject: Avoiding Recompilation Reply with quote

The intention of this thread is to summarize some ideas of a recent discussion of the dev-ml on a related bug so that discussion can take place without spamming that list with details for people not interested in it.
In particular, the topic I want to discuss here is also partially independent from these other discussions.

Description of the problem

Originally, the problem arose in the discussion about the behaviour of portage and other package managers with respect to static (vs. dynamic) dependencies, see e.g. this WikiPage. Roughly speaking, this discussion was about: should portage take the dependency information from /var/db or from the current tree? (Currently the behaviour is mixed which leads to various problems.)

It turned out that it might be a good idea to have a method to "bump" a package or at least to update its data in /var/db and .tbz2 files in a rather controlled way, if possible without recompiling the whole package.
Of course, the recompilation can only be skipped in certain cases if the ebuild maintainer knows exactly why he wants to skip it.
Perhaps one can find a solution which in some cases might be more useful than just for the current "static vs. dynamic deps" problem.

Although originally the whole discussion appeared in the context of the "static vs. dynamic deps" discussion, in the author's opinion the suggestions of 2. below are of independent interest and are useful independent of the outcome of that discussion: It might omit redundant recompilations in both cases.

Suggested solutions
  1. Ignore the problem and live with redundant recompilations. This has some variants
    1. If tree policy decides to support static deps fully, the number of unnessary recompilation will probably highly increase.
    2. If tree policy decides to go a half-harted static deps, there might soon be problems of packages not really updated although they should.
    3. If tree policy decides to remain with dynamic deps, portage has to be fixed, since currently there is in many cases a fallback to static deps. This fix of portage is not trivial, and nobody is volunteering to write it.
  2. Use some mechanism to tell portage that certain bumps can be done without recompilation. This comes also in some variants (explained in more detail below).
    1. Use minor revisions
    2. Use a new metadata variable
    3. Use other variables
    4. Invent some other mechanism (special file, entry in metadata.xml, etc.)
  3. Use other mechanisms to update the dependencies. For instance,
    1. Update by some pre-defined rules
    2. Extend the current pkgmove mechanism dramatically
    3. Invent some other mechanism.

Some details/remarks:

3. a means that trivial changes like dependencies foo/bar -> foo/bar:0 are just "copied", for others (e.g. adding of foo/bar:=) one might need more complicated rules; e.g. different behaviour if the adding happens within || ( ... ) or within an "and" dependency; the latter should perhaps require recompilation, etc.
The disadvantage of this method is that it is hard to find rules which are correct in all cases, and so again in some cases unnessary recompilation needs to be forced.

3. b has the disadvantage of being limited to certain "update language" which will probably never be able to include all cases. Moreover, the database will probably permanently grow, and entries cannot easily be removed. Difficulties arise if some change should be "undone" later on (which for pkgmoves is explicitly forbidden).

2. a means that a new version-syntax is introduced which allows "subrevisions", e.g. foo/bar-4711-r1.2. If updating from foo/bar-4711-r1.1 or foo/bar-4711-r1, the package manager is allowed to skip the phases "unpack", "prepare", "configure", "compile", "install", "merge" (incl. "remove") but will act almost as if the package is reinstalled: Instead of actual merging, he will just take the previous /var/db/pkg/foo/bar-4711-r*/CONTENTS file.
Of course, it is completely up to the ebuild maintainer to judge whether this behaviour is really correct in all cases.

2. b means that there is a new variable (whose name might be discussed here) which has e.g. a syntax similar to DEPEND: If upgrading from a version mentioned in this variable, the phases mentioned in 2.a can be skipped by the package manager. In a more complex setting, even USE-flags might be used in that variable which means that the corresponding part of that variable becomes only active (or fails to become active) if the upgrade is from the corresponding USE-Flag.

2. c means that some random variable name is used which is treated similarly as in 2.b. In contrast to 2.a and 2.b this would not require an EAPI-bump of the package, but it appears somewhat hackish.

All variants of 2. have the advantage/disadvantage that they might be used/misused to propagate also other changes (like USE-flags) without recompilation. The latter is reasonable only in the extended variant of 2.b and would need much care of the ebuild maintainer.
All variants of 2. have the disadvantage that mistakes by the ebuild maintainer can be rather severe and cause very subtle problems.

With 2.a there is the problem that subrevisions are already used by "sub"-distributions of gentoo (e.g. by prefix-portage). Also, it would require tools to be updated. (eix can do it, but other tools are probably not yet prepared for such changes).

With 2.b and 2.c there is the problem that the maintainer of the ebuild could easily forget that the variable needs to be updated, and that e.g. repoman cannot know whether an unchanged content of this variable is desired or a mistake. An idea to avoid this problem is to always require a certain change in this variable if it should be kept (e.g. by requiring that the current package version must be written as a first [otherwise ignored] word into this variable, or that the name of that variable should contain the revision). These latter suggestions, however, are probably rather confusing and unelegant.

Edit: Added some explanations/links according to the subsequent suggestion.


Last edited by mv on Mon Aug 04, 2014 2:32 pm; edited 2 times in total
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9532
Location: beyond the rim

PostPosted: Mon Aug 04, 2014 12:28 pm    Post subject: Reply with quote

If you want a discussion it might help if you define what the "static vs. dynamic dependencies" issue is or at least link to the relevant thread. Because I have no clue what that is about.

Regarding 2), the usual problem with "special" upgrades is always that it is not a property of a single CPV but the property of the relevant upgrade path. So whatever mechanism is used has to specify at least a "last compatible" version. Using minor revisions with special semantics is IMO a bad idea, as sooner or later people will use it for other purposes and then request more special casing (people always get creative when it comes to versioning).
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Mon Aug 04, 2014 2:34 pm    Post subject: Reply with quote

Genone wrote:
If you want a discussion it might help if you define what the "static vs. dynamic dependencies" issue is or at least link to the relevant thread.

Thanks. Although I see the two topics only partially related, I added the 3 relavant links where all details about it can be found: The discussion on gentoo dev-ml, the related bug, and perhaps most important: the dynamic deps WikiPage
I also added a brief explanation about the issue to the text, pointing out several times why the two problems are not directly related.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9532
Location: beyond the rim

PostPosted: Tue Aug 05, 2014 7:17 am    Post subject: Reply with quote

OMG, that "dynamic-deps" thing must be one of the worst ideas in Gentoo ever (plus the name is totally stupid). If people need a mechanism to "patch" metadata after installation they should come up with just that and not start adding broken semantics to core package manager systems.
So if I understand the original issue correctly (only checked the Wiki page for now) basically all that is needed is to replace vdb metadata with live ebuild metadata without actual remerge? Then a mix of 3b and 3c sounds the most appropriate solution to me ("global updates" have always been a nasty hack and could do with a proper redesign).

I understand you want to discuss the no-recompile proposal decoupled from the deps issue, but to me the whole idea of "updating" a package without changing its payload sounds conceptually wrong, so I'd rather get the underlying issue fixed properly than adding more hacks on top of an already overcomplicated system (well, I would if I had any business with Gentoo still).
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue Aug 05, 2014 2:59 pm    Post subject: Reply with quote

Genone wrote:
I'd rather get the underlying issue fixed properly

The underlying issue cannot be fixed properly. I try to summarize the corresopnding discussion once more in the following, although I emphasize once more that I see this as an independent topic.

The "static vs. dynamic issue" comes from the conflict that you have installed packages according to one "snapshot" state of the package tree but that the package tree evolves: Libraries get split, get merged, get replaced by other libraries (sometimes only partially) etc. - lots of variants can and do arise.
Whenever this happens, all packages which do (or might optionally) use any of the corresponding libraries might have become incompatible with the tree: If you use the static information (from /var/db), it might cause all sort of conflicts and blockers to the current tree and hinder updates (where in some cases you do not even see that these updates are hindered), and if you use dynamic information (from the tree) your dependency information might not match the state of the installed files.
At a first glance, both problems have "workarounds": The former, if every package with some changed dependency gets revbumped (at the cost of a lot of additional recompilations compared to now), the latter if you make sure that you called emerge -NDu @world before any emerge depclean.
Both of these strategies work most of the time, but both can fail in certain cases: The former if e.g. the revbump gets not "pushed" to the user because e.g. some circular dependency chain avoids the bump, for the latter I forgot in the moment, but certainly one can construct some cases.
Both of these strategies fail horribly if you have packages installed for which there is no longer a maintained ebuild: In the static case the bump would not happen, and in the dynamic case you have no chance to replace one library by another if that would require recompilation.

To summarize from a "philosophical" point of view: The conflict of having (static) packages installed and wanting to update from a (dynamically changing) tree can never be solved satsfactory. You have to choose a solution which causes the least issues. Since needs of users are different, it is not possible to find some strategy which will make everyone happy. This is why this discussion had become highly emotional and political.

That's why I would prefer to not continue this discussion in this thread: All technical arguments have essentially been exchanged in the dev-ml discussion. One needs to find a compromise and establish a decision in one way or another.

A possibility to update /var/db in a controlled way without recompiling is especially useful if you have static deps and must revbump hundred of packages due to a trivial library change. That's why the suggestion 2.a came up first. However, having a possibility to update e.g. your binpkgs to the current state of the tree or e.g. to add a trivial USE-change (from nonexistent to non-set) in certain cases without recompilation is handy with dynamic deps, too. Of course, this can be allowed only in a way controlled by the ebuild maintainer when he knows exactly that adding a deactivated USE-flags does not modify your files: I would probably already have saved thousands of recompilations if this feature would have existed earlier.

Quote:
Then a mix of 3b and 3c sounds the most appropriate solution to me ("global updates" have always been a nasty hack and could do with a proper redesign

The problem with this is the amount of data: If you want to avoid any restrictions by limited commands as in a., you have to remember more or less all changes of all dependencies forever, i.e. you need a rather large part of the full CVS/git history of the portage tree and of all overlays when you want to update. Or do you see another possibility?

Quote:
Regarding 2), the usual problem with "special" upgrades is always that it is not a property of a single CPV but the property of the relevant upgrade path.

The upgrade path is built into the suggestions: In case 2.a the upgrade path is "from the same revision but a lower minor revision".
In case 2.b or c the upgrade path is more flexible: It is the content of the variable mentioned.
In all other cases, a recompilation is done as usual.
Note that the user can easily force recompilation if for some reason he does not like the ebuild-maintainer's decision: He can just re-emerge the same version, and a re-emerge will of course recompile (by definition).
Quote:
as sooner or later people will use it for other purposes and then request more special casing

I am not sure whether having such a restriction is then an advantage or a disadvantage: It is of course a disadvantage that some possibilities are lost, but one should be aware that this feature is very delicate and needs much care, because it can cause very subtle issues. So "restricting" the creativity of developers here is perhaps not the worst thing which can be done.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9532
Location: beyond the rim

PostPosted: Wed Aug 06, 2014 8:54 am    Post subject: Reply with quote

I understood the problem, and with "the underlying issue" I actually meant the "update outdated vardb metadata". Just disagree with adding special hacks for abusing the remerge mechanics to solve it (and totally disagree with using tree metadata for installed packages for that matter) which I assumed you wanted to focus on.

As the existing "global update" operations fall into the same category it would seem smart to integrate this, but as you can't replace pkgmoves with standard ebuild operations (neither "dynamic deps" nor "metadata-only remerge") the obvious suggestion for me would be to come up with a system to apply such tree changes to vardb in a consistent and transparent way. Now I don't have put much thought into that subject yet and I know it's far from trivial, so for now I'm just pointing out what I consider problems in the proposals so far.

Quote:
So "restricting" the creativity of developers here is perhaps not the worst thing which can be done.

What I tried to say there was that this doesn't work, never has. History has shown that people will bend, break or ignore such rules (not always intentionally) and then complain if it doesn't work as they intended and request further special casing to fit their use case. As you pointed out subrevisions are already used in other contexts anyway causing compability issues, and adding special semantics to it would further reduce any chance to change versioning syntax in the future.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Wed Aug 06, 2014 9:39 am    Post subject: Reply with quote

Genone wrote:
the obvious suggestion for me would be to come up with a system to apply such tree changes to vardb in a consistent and transparent way.

This is easy to say in an abstract way, but what do you mean by this, exactly?
Is there really something which is more constent and transparent than a version bump? (If it would just not have the horrible disadvantage of requiring a lot of time to actually recompile the packages?)
Do not forget that you need to cover a lot of cases: Possibly needing to handle dependency updates of the form "foo/bar:=" -> "foo/bar2:=", possibly only adding/removal/happening in some deeper || ( a/b || ( ... ) ) clauses, etc.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9532
Location: beyond the rim

PostPosted: Wed Aug 06, 2014 10:23 am    Post subject: Reply with quote

As said, I haven't put a lot of thought into it yet. Lets get to some common ground first to isolate the specific requirements before dabbling into solutions. What is needed is a mechanism to
a) replace a vardb ebuild (and derived metadata files) with its updated tree counterpart
b) potentially propagate the results to reverse dependencies (ideally not necessary as maintainers will update all affected ebuilds, and could cause problems)
c) apply further tree modifications (renames specifically) to the vardb state

Currently, ignoring that idiotic dynamic-deps behavior, remerges (including recompile) deal with a) and pkgmove instructions in global updates with c). slotmoves (do those still exist?) belong to group a) and b).

Can we agree on that so far?
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Wed Aug 06, 2014 11:46 am    Post subject: Reply with quote

Genone wrote:
What is needed is a mechanism to
a) replace a vardb ebuild (and derived metadata files) with its updated tree counterpart
b) potentially propagate the results to reverse dependencies (ideally not necessary as maintainers will update all affected ebuilds, and could cause problems)
c) apply further tree modifications (renames specifically) to the vardb state

b) is not necessary - this must be done by the ebuilds using it.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Sep 10, 2014 2:19 am    Post subject: Reply with quote

Hey mv, sorry to post so late, I've been quite busy and only caught up on ML a bit today, when I saw your response in mutt.
mv wrote:
Genone wrote:
I'd rather get the underlying issue fixed properly

The underlying issue cannot be fixed properly. I try to summarize the corresponding discussion once more in the following, although I emphasize once more that I see this as an independent topic.

Actually it can. It just requires acknowledgement that there's a glaring hole in the original portage design, which no-one wants to do.
Quote:
The "static vs. dynamic issue" comes from the conflict that you have installed packages according to one "snapshot" state of the package tree but that the package tree evolves: Libraries get split, get merged, get replaced by other libraries (sometimes only partially) etc. - lots of variants can and do arise.

As you state (and Genone concurs) the underlying issue is libraries and ABI breakage. The first step in tracking that, is to recognise that runtime library dependencies are critical to ongoing operation of the machine, and to express that metadata accordingly. The BSDs have been doing this for years, with LIB_DEPENDS.

From there you use the standard GNU binutils etc, to track the information we're tracking now, which we always knew we'd have to track, about linkage and soname changes. Given that base, you extend the toolkit with tools to extract, process and verify, metadata about things like struct layout, alignment and symbol exposure (most of which are already written). Essentially you provide a QA service to the upstreams, which Gentoo already does manually, at the same time as you provide assurance to your downstream users that their machines aren't going to be broken by an upgrade without warning.

Those tools feed back to the developer before they commit, of course, and enable people like patrick who run tinderboxes, as well as downstream bindists, not to mention network admins, to provide much more in-depth QA to each other and their users.

There's another aspect to the discussion you're having, about updates to the vdb in a controlled fashion, but first and foremost we should be protecting end-user machines, since the same tools notify us of breakage before we commit, and to do so they give us the data we need, in order even to have the information to propagate in the first place; at least if we want that reliably.

However we won't do it by ignoring the notational problem, since (as my boss always keeps telling me;) good notation is central:
Kernighan & Pike wrote:
The power of notation comes from having a good one for each problem.. languages together are more powerful than any one of them in isolation.
It's worth breaking the job into pieces if it enables you to profit from the right notation.
("The Practice of Programming", 1999)

I'll have to come back to the other part about distributing updates, as I haven't followed everything you've written closely, and I need to get some sleep. Not ignoring it, it looks quite interesting. Nor is it exactly a hard problem, imo; the underlying library issue (and the completely inadequate design) is the actual root cause.

Regards,
steveL.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Wed Sep 10, 2014 8:04 pm    Post subject: Reply with quote

steveL wrote:
From there you use the standard GNU binutils etc, to track the information we're tracking now

This is what portage's @preserved-rebuild already does, but this is not a solution:
First of all, GNU binutils only refers to binary programs - interpreter languages or other logical dependencies (on some data files, e.g.) are not caught by it.
Second, and more important, it does not solve the problem - which as I claimed is unsolvable:
If a package providing a library is removed, and a newly introduced package provides this library instead, no automatic mechanism can know this (unless you ship precompiled binaries or lists of libraries for each USE-flag combination).
And the difficulty is what to do in such a situation if at least the package maintainer is aware of such a situation (of course, he might make mistakes, but this is a further problem): Keeping on the obsolete state is wrong - which is what static deps do - when you eventually want to update the machine. On the other hand, if you want to "cement" the current state and actually do not want to upgrade to the new package constellation, the dynamic deps will not do what you want.
IMHO, it is the attitude that you can "cement" the state of the machine while using a changing tree is the actual problem: Sooner or later it will bring you into a situation where it is practically impossible to cope up with the current tree: You just postpone the problem until one day it is really unsolvable and requires practically a reinstallation. Just currently, nobody experienced this, since portage does dynamic deps by default.
This is the reason why I so vehemently favour dynamic deps... But now we are again in the middle of a discussion which should not be the topic of the thread...
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Thu Sep 11, 2014 5:00 pm    Post subject: Reply with quote

mv wrote:
steveL wrote:
From there you use the standard GNU binutils etc, to track the information we're tracking now

This is what portage's @preserved-rebuild already does

That would be "the information we're tracking now". Portage uses it for that purpose, the point is that we have information about both the users runtime system, and the specific combinations of packages built under the USE-flags the user specified, which they are about to install. There's more that can be done, but (much) more painfully if we're so unclear about library dependencies in our basic notation.
Quote:
but this is not a solution:
First of all, GNU binutils only refers to binary programs - interpreter languages or other logical dependencies (on some data files, e.g.) are not caught by it.

Hence the notational problem, which is still very much cogent in the realm of binary packages, since it's hard to reason about something so central, especially when coding, if you have no direct expression of it in the notation you use to do your thinking (you probably want the pdf if you don't already have it, but the html page may be preferred by some).

However you should bear in mind that it's always possible to bootstrap interpreters, so long as you have a fundamental set of binary libraries. IOW if we sort out the binary situation, the interpreter scenario is both done, and not a problem even if it should break down, since we can bootstrap the interpreters, but it's a lot harder to bootstrap the binary side. OFC neither is really an issue, since that's what prior, known-working livedisks are for, but in the hypothetical scenario of bringing up a toolchain that doesn't need perl to evaluate configure scripts, and even in it, properly-tracked ABI dependencies are a much more useful base. They're just much harder to get right with such a crippled notation.
Quote:
Second, and more important, it does not solve the problem - which as I claimed is unsolvable:
If a package providing a library is removed, and a newly introduced package provides this library instead, no automatic mechanism can know this (unless you ship precompiled binaries or lists of libraries for each USE-flag combination).
And the difficulty is what to do in such a situation if at least the package maintainer is aware of such a situation (of course, he might make mistakes, but this is a further problem):

Ah this is where I see the fundamental problem in your argument: you need to separate out what happens on the user machine from what happens in the tree. The latter flows from the former, and we have all the information we need to make decisions on the user's machine, even if usually that should be simply to bail out with decent information about the problem. Higher layers/other modules can then script around that however you like. First get the core working properly.

I wouldn't even begin to approach the tree side of things without the basis.
Quote:
Keeping on the obsolete state is wrong - which is what static deps do - when you eventually want to update the machine. On the other hand, if you want to "cement" the current state and actually do not want to upgrade to the new package constellation, the dynamic deps will not do what you want.
IMHO, it is the attitude that you can "cement" the state of the machine while using a changing tree is the actual problem: Sooner or later it will bring you into a situation where it is practically impossible to cope up with the current tree: You just postpone the problem until one day it is really unsolvable and requires practically a reinstallation.

It's certainly true that we have a "current state of the machine" before we emerge anything, and that includes information about all the binary libs installed, although it's not as good as it could be. The point is that if we were used to thinking in terms of "lib_depends" as a code concept, alongside rdepends and depends (which are "bdepends" in BSD terms) firstly both the latter sets would be a lot smaller, and secondly everyone would naturally be writing scripts to track lib_depends, in a transparent fashion, which makes cross-project collaboration, as well as actually dealing with this issue via automated tools, much more feasible.

Again, though your argument is weaker, imo, since it's not distinct.
Quote:
Just currently, nobody experienced this, since portage does dynamic deps by default.
This is the reason why I so vehemently favour dynamic deps... But now we are again in the middle of a discussion which should not be the topic of the thread...

Well i appreciate you want to discuss mechanisms to update the vdb, and I agree there's fun to be had there. It's just not that difficult a problem, and in fact the real problem you wanted to discuss that as a solution to, is much more fundamental. But by all means, I'll enjoy that discussion too; I just won't believe it's a solution to the problem you describe, although I concede it would likely be the tail-end of one. Good luck getting the data.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Fri Sep 12, 2014 8:41 am    Post subject: Reply with quote

steveL wrote:
However you should bear in mind that it's always possible to bootstrap interpreters

You are thinking about a running toolchain. But this should be kept separately IMHO. This is the problem of @system containing the proper things.
The discussion I am thinking about is more about the more advanced parts of the system where you have really complicated dependencies; the dependencies of the toolchain itself are not complicated and probably also do not change heavily over time.
Quote:
since we can bootstrap the interpreters, but it's a lot harder to bootstrap the binary side.

This is not true. For instance, you cannot bootstrap icedtea: You need a running java to compile icedtea. Dependencies in higher languages can be as severe as in C or even worse, especially in projects requiring several languages in the build system.
But the point is not whether some expert who has knowledge about all dependencies and manually works around issues can do it after many hours but whether it just works automatically: For a real expert, it makes no difference whether you have dynamic or static deps since he can just emerge everything in the required order with -O.
Quote:
Ah this is where I see the fundamental problem in your argument: you need to separate out what happens on the user machine from what happens in the tree.

Of course, this is the root of the problem and why the problem is unsolvable.
This separation is not a requirement which I artificially introduce, but which exists by its mere nature.
Actually, not what is "happening" but what "has happened", because you cannot see the history from the tree, and it would be too much data to store all possibly relevant information about the history in the tree: You have some state of packages on your machine and another possibly rather divergent (if the last sync was long ago) state in the tree which are vastly incompatible.
This is just the situation you have, and you have to deal with it. Then one solution ("dynamic deps") is to ignore the state of the machine as far as possible, and the other solution ("static deps") is to use only that part of the tree which is compatible and ignore the rest. Both solutions can lead to all sort of problems.
Quote:
even if usually that should be simply to bail out with decent information about the problem.

In complex dependency situations, it is impossible to give reasonable information: dependency resolving, especially when USE-flag changes and circular deps are involved, is a hard problem (perhaps even NP-hard?), and you can see with current portage output that the "located" problem is often actaully the wrong place. There is not much hope that this can be improved: The only decent sources of information are the full dependency trees, and this information is still incomplete (since e.g. the libs which the useflags actually will install cannot be seen). Yes, maybe one can extend the dependency language to include this information, but then it would be a very complex language, easy to make mistakes, etc., and still not all informatin is included (e.g. why it is impossible bootstrap icedtea).
Even worse: With static deps there might appear to be no problem (everythng can be resolved), although actually it might be that you are using packages which are maiintained only for compatibility reasons and not the actual "main" packages: This main packages might conflict with your current state, and thus you are not offered them but the "working" solution with the old packages is chosen automatically.
With dynamic deps, OTOH, you are more likely to be offered a solution which appears to work which then for some reason actually doesn't, because some of your packages would first have to be recompiled to use some new ABI.
Quote:
The point is that if we were used to thinking in terms of "lib_depends" as a code concept, alongside rdepends and depends

Which information is contained in lib_depends which is not already contained in rdepends and the rdepends string stored in /var/db? (Possible mnistakes of ebuild maintainers set aside).
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Fri Sep 12, 2014 10:46 am    Post subject: Reply with quote

mv wrote:
steveL wrote:
However you should bear in mind that it's always possible to bootstrap interpreters

You are thinking about a running toolchain. But this should be kept separately IMHO. This is the problem of @system containing the proper things.
The discussion I am thinking about is more about the more advanced parts of the system where you have really complicated dependencies; the dependencies of the toolchain itself are not complicated and probably also do not change heavily over time.

Well, your argument is all over the shop, then (that means "it keeps jumping from place to place".)

Firstly, toolchain is the basis we bootstrap from, as anyone who's done any cross-compilation knows. So that's essential, are we agreed, or not?

And in fact I was thinking of perl, nothing else, given the toolchain; as you say, the toolchain's not that complicated in dep-terms, and further we're used to handling it (and we automate handling of it in any case.)
Quote:
Quote:
since we can bootstrap the interpreters, but it's a lot harder to bootstrap the binary side.

This is not true. For instance, you cannot bootstrap icedtea: You need a running java to compile icedtea. Dependencies in higher languages can be as severe as in C or even worse, especially in projects requiring several languages in the build system.
But the point is not whether some expert who has knowledge about all dependencies and manually works around issues can do it after many hours but whether it just works automatically: For a real expert, it makes no difference whether you have dynamic or static deps since he can just emerge everything in the required order with -O.

And what do you think a script is about, if not encapsulating domain-specific knowledge?

Further I'd like to see you do any other package building without a toolchain sorted first.
Quote:
Quote:
Ah this is where I see the fundamental problem in your argument: you need to separate out what happens on the user machine from what happens in the tree.

Of course, this is the root of the problem and why the problem is unsolvable.
This separation is not a requirement which I artificially introduce, but which exists by its mere nature.

Huh? My point was that you're not actually dealing with the separation at all, but have been talking across it, with your argument jumping between what happens on the user machine and what happens in the tree.

So no, you didn't introduce it: I did, because it exists as you say, and because you've been ignoring it up till now.

And ffs stop with the unsolvable bit, since you tried that before with a completely different basis for being unsolvable (snapshot vs installed, both on the user machine.)
Quote:
Actually, not what is "happening" but what "has happened", because you cannot see the history from the tree, and it would be too much data to store all possibly relevant information about the history in the tree: You have some state of packages on your machine and another possibly rather divergent (if the last sync was long ago) state in the tree which are vastly incompatible.
This is just the situation you have, and you have to deal with it. Then one solution ("dynamic deps") is to ignore the state of the machine as far as possible, and the other solution ("static deps") is to use only that part of the tree which is compatible and ignore the rest. Both solutions can lead to all sort of problems.

*sigh* you're still convinced that you have a solid understanding of everything that's going on, and I'm sure you do: I just can't see it in the arguments you've presented. Both "solutions" suffer from a lack of decent information.
Quote:
Quote:
even if usually that should be simply to bail out with decent information about the problem.

In complex dependency situations, it is impossible to give reasonable information: dependency resolving, especially when USE-flag changes and circular deps are involved, is a hard problem (perhaps even NP-hard?), and you can see with current portage output that the "located" problem is often actaully the wrong place. There is not much hope that this can be improved: The only decent sources of information are the full dependency trees, and this information is still incomplete (since e.g. the libs which the useflags actually will install cannot be seen). Yes, maybe one can extend the dependency language to include this information, but then it would be a very complex language, easy to make mistakes, etc., and still not all informatin is included (e.g. why it is impossible bootstrap icedtea).
Even worse: With static deps there might appear to be no problem (everythng can be resolved), although actually it might be that you are using packages which are maiintained only for compatibility reasons and not the actual "main" packages: This main packages might conflict with your current state, and thus you are not offered them but the "working" solution with the old packages is chosen automatically.

Actually I meant information along the lines of "not installing cat-foo/pkg-fubar-0.1.1 as: libfubar-0.1.1.so breaks compatibility with libfubar-0.1.0.so but has the same soname" after you've done the make DESTDIR=.. install. On the user machine, based on what they have currently installed, first and foremost. Get that right, and it becomes a lot easier to do the distribution side. Without it, the distribution side won't happen anywhere near as effectively as it could.

Not sure what all this "new language" stuff is about: it sounds like more invention along the lines of the crap that's holding the project back in so many ways, but since it's you I'm willing to accept you have something in mind. That doesn't mean it's needed.

It's odd that you can start to envisage a whole new language, but ignore basic notation. Did you even check the links I gave you? Am I to assume you've read Iverson's paper before, as well as "The Practice of Programming" and this is all old-hat to you? Then YTF haven't you even addressed that point? It's like you have a blind-spot, or are blinkered in your thinking, which I don't want to believe based on the respect I have for you, after what has been useful (for my work) interaction on these forums over several years.
Quote:
With dynamic deps, OTOH, you are more likely to be offered a solution which appears to work which then for some reason actually doesn't, because some of your packages would first have to be recompiled to use some new ABI.

The point I'm making is that it's pitiful that we still have not automated checks of that, and the reason for that is that lib_depends are only ever expressed indirectly, and hand-waving about revdep-rebuild has now morphed into hand-waving about (sub,)slot-operators instead of dealing with the notational problem.
Quote:
Quote:
The point is that if we were used to thinking in terms of "lib_depends" as a code concept, alongside rdepends and depends

Which information is contained in lib_depends which is not already contained in rdepends and the rdepends string stored in /var/db? (Possible mnistakes of ebuild maintainers set aside).

As so many Gentoo devs do, you're focussing on the problem that's right in front of you now, and ignoring the bigger picture. When I said "good luck getting the data", I hoped you'd see what I meant immediately (from experience): good luck managing that data flow, as well as processing it, over time.

To answer you directly: none, currently, though you have to extract it from other strings, and hope the format doesn't change any (but presumably that's ok in your world, as we're all tied to one codebase?) However once you have the concept, you can then extend the syntax from a clean base, having stipulated that certain operators (in the pre-LIB_DEPENDS sense: they'd be removed in a better design, most likely) apply by default. The point is that you treat a certain class of dependencies, the ones every upstream, every developer and every user, worry about, specially, because they are fundamental.

For an upstream it's something external they rely on, and need to inform the user about on their "build from source" page, and if not: in their "dependencies" or "other software" section.

For a user it's something they need to get the software they want working, it's something that breaks (or breaks something else, apparently unconnected for a novice) if upgrades are not done carefully, and for some reason it's never been handled with any kind of nous. Instead we've had years of unnecessary revdep-rebuilds, and now we're given a turd of a design because no-one has the guts to face down McCreesh, so his frankly woeful ideas gain traction. (EAPI-in-suffix very nearly made it through that "rigorous" review process, if you recall.)

For a developer, it's important because of the above, and because no matter how arrogant or brainwashed they may be, no-one wants to field more bug-reports than they have to, and let's be realistic: it doesn't look very good when you don't provide decent upgrade paths, however much some may enjoy forcing users to use stuff they don't want.

Are we agreed they're fundamental, or do you wish to argue that point as well?

Does it not strike you as even remotely peculiar that the most fundamental form of dependency is not directly expressible?

Use your imagination to answer your own questions: imagine for one moment that LIB_DEPENDS are expressed directly in the ebuild, meaning that the packages listed are both RDEPENDS and DEPENDS in Gentoo terms: they must be built before this package, are required at runtime, and must be built in ROOT, ie for CHOST not CBUILD (ignoring toolchain, and focussing on the 99%.) They have an implicit binding, so no slot-operators are necessary. Taking the fundamental case first, we track any binary libraries in those packages, and run a few simple tools to gather ABI metadata, more than we do now. What would you do from that basis?

If you're going to ignore the notational aspects, and the documents I link you to, then you should just get on with the discussion about modifying the vdb, as I indicated in my prior post.

To get you back to the vdb thing:
Genone wrote:
As said, I haven't put a lot of thought into it yet. Lets get to some common ground first to isolate the specific requirements before dabbling into solutions. What is needed is a mechanism to
a) replace a vardb ebuild (and derived metadata files) with its updated tree counterpart
b) potentially propagate the results to reverse dependencies (ideally not necessary as maintainers will update all affected ebuilds, and could cause problems)
c) apply further tree modifications (renames specifically) to the vardb state

Currently, ignoring that idiotic dynamic-deps behavior, remerges (including recompile) deal with a) and pkgmove instructions in global updates with c). slotmoves (do those still exist?) belong to group a) and b).

Can we agree on that so far?

mv thinks b) should be done by ebuild authors and (afaict) that dynamic deps are good. I'll bow out for a while, noting only that dynamic-deps just need to be made smarter, based on more useful information which flows best from a smarter design. And then dynamic deps are the vdb updates.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Fri Sep 12, 2014 4:22 pm    Post subject: Reply with quote

This patch looks like it will make dynamic-deps a lot smarter, in that they will take into installed packages (info on the user-machine) into account.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Fri Sep 12, 2014 8:47 pm    Post subject: Reply with quote

steveL wrote:
Firstly, toolchain is the basis we bootstrap from, as anyone who's done any cross-compilation knows. So that's essential, are we agreed, or not?

Toolchain is essential for the system, but practically irrelevant for our discussion: In foreseeable future there will nothing change concerning toolchain, no matter whether we have dynamic deps, static deps, some mechanism to update database, etc. Simply, toolchain is a more-or-less static piece of software for which dependencies practically do not matter.
Quote:
And what do you think a script is about, if not encapsulating domain-specific knowledge?

I do not understand. Neither of us talked about any script before. Which script do you mean which does emerge's job on user's systems? Who wrote it? Why did it get domain-specific knowledge?
Quote:
I did, because it exists as you say, and because you've been ignoring it up till now.

Where did I ignore it? I said from the very beginning that you cannot "cement" the state of the machine without taking the changes of the tree into account. Of course, I assume here that the tree changes independently of the machine - that's why people use emerge --sync, and I supposed that this does not have to be explained.
Quote:
with a completely different basis for being unsolvable (snapshot vs installed, both on the user machine.)

snapshot vs. installed = tree vs. user's system. Exactly the separation you claim to be missing. Somehow there seems to be a severe misunderstand between us.
Quote:
Both "solutions" suffer from a lack of decent information.

I am actually getting tired to repeat all arguments over and over.
Moreover, this discussion really leads to nothing: A decision will be made concerning the dynamic or static deps default and corresponding tree policy. This will be made by the council or some (hopefully many) gentoo developers, and the discussion here will simply have no influence on the decision.
Therefore, I think further discussing this topic here is jsut a waste of time.
Quote:
Actually I meant information along the lines of "not installing cat-foo/pkg-fubar-0.1.1 as: libfubar-0.1.1.so breaks compatibility with libfubar-0.1.0.so but has the same soname" after you've done the make DESTDIR=.. install.

So you mean information for the ebuild writer that he does not miss to insert correct {R,}DEPENDS? This seems again to be a very different topic.
Quote:
Not sure what all this "new language" stuff is about

Probably this is another misunderstanding between us: I interpreted your links (apparently falsely) that you wanted to suggest an extension of such a kind. (See below, why I understood this.)
Quote:
The point I'm making is that it's pitiful that we still have not automated checks of that, and the reason for that is that lib_depends are only ever expressed indirectly

Wether you express them in terms of lib_depends or more directly as RDEPENDS with (correct) subslots does not matter. The problem is what to do with this information. More precisely, what to do when the stored one and the one in the tree are incompatible. (Apart from that, as I already said and we seem to agree: storing the provided libs in the portage tree (in contrast to /var/db) is a task close to impossible)
Quote:
As so many Gentoo devs do, you're focussing on the problem that's right in front of you now, and ignoring the bigger picture

There is no "bigger picture": We are talking about a package manager, not about the answer to the life, the universe, and everything.
Especially for a PM compiling from source there can (and do) occur all sort of unexpected problems (also human mistakes) from time to time. Trying to elminate human errors by making the algorithm too smart is not the best solution - as almost never: KISS. That's what made the sucess of ports/portage. IMHO, the subslots were already a step in a too complicated solution and are actually the reason why the problems have become so severe: They were not propertly imlpemented (forcing a mixture of static vs. dynamic deps).
Fortunately, it seems that Zac's patch you posted might solve that problem, finally.
Quote:
Does it not strike you as even remotely peculiar that the most fundamental form of dependency is not directly expressible?

No, because the provider is not directly expressible in a source-based distribution. What you have in mind works perfectly for a binary distribution (where the task of a PM is really easy. In fact, this is how rpm works, for instance).
However, in the presence of USE-flags and dozens of architectures, you cannot reasonably store this information in the tree (only in /var/db where it is not too useful).
If you want to store it in the tree, you need a complex new syntax (describing for which USE-flags and which archiecture which libraries will be provided by the package): This is why I interpreted your links that you want to suggest such a syntax.
Without such an extended complex language which allows the PM to recognize the provider, the information that you want "foo.so.0" is completely useless....
And that extended complex language calls for further human errors. One cannot autogenerate this information: The number of USE-flag combinations is too large. It is the developer who must analyze which combinations will install which libraries.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum