Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
emerge is non-deterministic?
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Fri May 18, 2018 6:47 pm    Post subject: Reply with quote

steveL wrote:
But python works well as a high-level language, interfacing to C for speed where needed
mv wrote:
If all of the dependency resolver is written in C (or C++) then one can argue whether C (or C++) is actually the main implementation language of the pm.
If any part is not written in C/C++ then this part cannot be parallelised.
I don't think it really matters which one considers the main; what matters are the results.
And pkgcore is blazingly fast by comparison with portage (which is blazingly fast by comparison with paludis) and I trust ferringb on correctness over anyone else working in this space.

IIRC pkgcore was written with correctness in mind first, obviously with a hope to improve algorithmic efficiency, but just to have a clean codebase. The reimplementation based on all the lessons learnt the first time around on the "second-system" of dubious bloat.

I just wonder what's keeping it from EAPI-6; if it's the bash (ebuild.sh) or eclass side, or the python side. (I found radhermit a bit prickly when I went to offer help a year or two ago, so I left him his space.)
I doubt eclasses are the problem; just not sure what could be so demanding of the python side. (it's been a couple of years.)
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Fri May 18, 2018 6:49 pm    Post subject: Reply with quote

Genone wrote:
Constant input should lead to constant results for --pretend, no matter if the result is correct. Otherwise --pretend is broken by definition as you cannot rely on portage to actually show the action that would be performed.
Agreed.

And thanks to all for the explanations I should have considered as possibilities first time around.
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 535

PostPosted: Fri May 18, 2018 8:40 pm    Post subject: Reply with quote

Bug filed: https://bugs.gentoo.org/656074
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Sat May 19, 2018 6:00 am    Post subject: Reply with quote

krinn wrote:
Users in need for it have to use the new one

Many users will be general-purpose libraries. It is correct that all these libraries use salted hash. In fact, there should actually be no need for unsalted hash at all.
However, it would be nice to have a way to initialize the random generator for the salt at the beginning of the program with a fixed value if the user needs reproducibility of a result (like in the current case).
Perhaps setting the seed of python's random function does have the required side effect?
(Although this would be somewhat disappointing since it might mean that the salt is not cryptographically strong and such might miss its purpose).
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21602

PostPosted: Sat May 19, 2018 4:22 pm    Post subject: Reply with quote

mv wrote:
However, it would be nice to have a way to initialize the random generator for the salt at the beginning of the program with a fixed value if the user needs reproducibility of a result (like in the current case).
The developers thought of this. Set $PYTHONHASHSEED in the environment to influence the hash salt. If unset, salting is automatic. It needed to be set very early since it is not safe to change the seed once it has been used to bias any hash lookups. Modifying it from within the Python script would almost certainly be too late.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Sat May 19, 2018 7:06 pm    Post subject: Reply with quote

Hu wrote:
Modifying it from within the Python script would almost certainly be too late.

s/almost//
Parsing the python script already needs hashing if the script contains any variables or functions.
Perhaps portage should obtain a wrapper which sets PYTHONHASHSEED.

Anyway, it would be interesting whether portage is still toggling after
Code:
export PYTHONHASHSEED=0
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9523
Location: beyond the rim

PostPosted: Tue May 22, 2018 11:09 am    Post subject: Reply with quote

mv wrote:
Perhaps portage should obtain a wrapper which sets PYTHONHASHSEED.

Not until the actual cause of this behavior is determined. Ideally portage should not rely on hashes being constant, e.g. all places where order matters should use an ordered data structure with explicit sorting (if that is the issue).
krinn wrote:
that's not a good point for python, i could understand they need to fix this, but changing a function like that is not good ; why didn't they create a new randomize function doing the same but with salt? Users in need for it have to use the new one and everyone is happy.

Because this isn't about explicit function calls but the internal identity of all objects. Of which there can only be one.
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 535

PostPosted: Tue May 22, 2018 12:13 pm    Post subject: Reply with quote

fwiw, PYTHONHASHSEED=0 indeed appears to be bring determinism :wink:
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue May 22, 2018 12:14 pm    Post subject: Reply with quote

Genone wrote:
explicit function calls but the internal identity of all objects. Of which there can only be one.

In theory, one could have made the hash function (i.e. seed) part of the object. Actually I thought first that's how it's implemented, because changing the seed for every object with a cryptographically strong one-way function seems to be the only secure way. Apparently, they have chosen the quicker way of using the same seed everywhere and never changing it. Of course, considering how often the same symbol has to be looked up in different tables in python, this simplification is a compromise to security which could be implemented without any time penalties.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum